Method for determining copy number variations

ABSTRACT

The invention provides a method for determining copy number variations (CNV) of a sequence of interest in a test sample that comprises a mixture of nucleic acids that are known or are suspected to differ in the amount of one or more sequence of interest. The method comprises a statistical approach that accounts for accrued variability stemming from process-related, interchromosomal and inter-sequencing variability. The method is applicable to determining CNV of any fetal aneuploidy, and CNVs known or suspected to be associated with a variety of medical conditions. CNV that can be determined according to the present method include trisomies and monosomies of any one or more of chromosomes 1-22, X and Y, other chromosomal polysomies, and deletions and/or duplications of segments of any one or more of the chromosomes, which can be detected by sequencing only once the nucleic acids of a test sample. Any aneuploidy can be determined from sequencing information that is obtained by sequencing only once the nucleic acids of a test sample.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.15/659,523, filed on Jul. 25, 2017, which is a continuation of U.S.application Ser. No. 13/364,809, filed on Feb. 2, 2012, which is acontinuation of U.S. application Ser. No. 13/191,366, filed on Jul. 26,2011, which is a continuation-in-part of U.S. application Ser. No.12/958,352, filed on Dec. 1, 2010, which claims priority to U.S.Provisional Application Ser. No. 61/407,017, filed on Oct. 26, 2010,which applications are incorporated by reference in their entirety.

FIELD OF THE INVENTION

The invention relates generally to the field of diagnostics, andprovides a method for determining variations in the amount of nucleicacid sequences in a mixture of nucleic acids derived from differentgenomes. In particular, the method is applicable to the practice ofnoninvasive prenatal diagnostics, and to the diagnosis and monitoring ofmetastatic progression in cancer patients.

BACKGROUND OF THE INVENTION

One of the critical endeavors in human medical research is the discoveryof genetic abnormalities that are central to adverse healthconsequences. In many cases, specific genes and/or critical diagnosticmarkers have been identified in portions of the genome that are presentat abnormal copy numbers. For example, in prenatal diagnosis, extra ormissing copies of whole chromosomes are the frequently occurring geneticlesions. In cancer, deletion or multiplication of copies of wholechromosomes or chromosomal segments, and higher level amplifications ofspecific regions of the genome, are common occurrences.

Most information about copy number variation has been provided bycytogenetic resolution that has permitted recognition of structuralabnormalities. Conventional procedures for genetic screening andbiological dosimetry have utilized invasive procedures e.g.amniocentesis, to obtain cells for the analysis of karyotypes.Recognizing the need for more rapid testing methods that do not requirecell culture, fluorescence in situ hybridization (FISH), quantitativefluorescence PCR (QF-PCR) and array-Comparative Genomic Hybridization(array-CGH) have been developed as molecular-cytogenetic methods for theanalysis of copy number variations.

The advent of technologies that allow for sequencing entire genomes inrelatively short time, and the discovery of circulating cell-free DNA(cfDNA) have provided the opportunity to compare genetic materialoriginating from one chromosome to be compared to that of anotherwithout the risks associated with invasive sampling methods. However,the limitations of the existing methods, which include insufficientsensitivity stemming from the limited levels of cfDNA, and thesequencing bias of the technology stemming from the inherent nature ofgenomic information, underlie the continuing need for noninvasivemethods that would provide any or all of the specificity, sensitivity,and applicability, to reliably diagnose copy number changes in a varietyof clinical settings.

The present invention fulfills some of the above needs and in particularoffers an advantage in providing a reliable method that is applicable atleast to the practice of noninvasive prenatal diagnostics, and to thediagnosis and monitoring of metastatic progression in cancer patients.

SUMMARY OF THE INVENTION

The invention provides a method for determining copy number variations(CNV) of a sequence of interest in a test sample that comprises amixture of nucleic acids that are known or are suspected to differ inthe amount of one or more sequence of interest. The method comprises astatistical approach that accounts for accrued variability stemming fromprocess-related, interchromosomal and inter-sequencing variability. Themethod is applicable to determining CNV of any fetal aneuploidy, andCNVs known or suspected to be associated with a variety of medicalconditions. CNV that can be determined according to the present methodinclude trisomies and monosomies of any one or more of chromosomes 1-22,X and Y, other chromosomal polysomies, and deletions and/or duplicationsof segments of any one or more of the chromosomes, which can be detectedby sequencing only once the nucleic acids of a test sample. Anyaneuploidy can be determined from sequencing information that isobtained by sequencing only once the nucleic acids of a test sample.

In one embodiment, a method is provided for determining the presence orabsence of any four or more different complete fetal chromosomalaneuploidies in a maternal test sample comprising fetal and maternalnucleic acid molecules. The steps of the method comprise (a) obtainingsequence information for the fetal and maternal nucleic acids in thematernal test sample; (b) using the sequence information to identify anumber of sequence tags for each of any four or more chromosomes ofinterest selected from chromosomes 1-22, X and Y and to identify anumber of sequence tags for a normalizing chromosome sequence for eachof the any four or more chromosomes of interest; (c) using the number ofsequence tags identified for each of the any four or more chromosomes ofinterest and the number of sequence tags identified for each normalizingchromosome to calculate a single chromosome dose for each of the anyfour or more chromosomes of interest; and (d) comparing each of thesingle chromosome doses for each of the any four or more chromosomes ofinterest to a threshold value for each of the four or more chromosomesof interest, and thereby determining the presence or absence of any fouror more complete different fetal chromosomal aneuploidies in thematernal test sample. Step (a) can comprise sequencing at least aportion of the nucleic acid molecules of a test sample to obtain saidsequence information for the fetal and maternal nucleic acid moleculesof the test sample. In some embodiments, step (c) comprises calculatinga single chromosome dose for each of the chromosomes of interest as theratio of the number of sequence tags identified for each of thechromosomes of interest and the number of sequence tags identified forthe normalizing chromosome sequence for each of the chromosomes ofinterest. In some other embodiments, step (c) comprises (i) calculatinga sequence tag density ratio for each of the chromosomes of interest, byrelating the number of sequence tags identified for each of thechromosomes of interest in step (b) to the length of each of thechromosomes of interest; (ii) calculating a sequence tag density ratiofor each normalizing chromosome sequence by relating the number ofsequence tags identified for the sequence in step (b) to the length ofeach normalizing chromosome; and (iii) using the sequence tag densityratios calculated in steps (i) and (ii) to calculate a single chromosomedose for each of the chromosomes of interest, wherein the chromosomedose is calculated as the ratio of the sequence tag density ratio foreach of the chromosomes of interest and the sequence tag density ratiofor the normalizing chromosome sequence for each of the chromosomes ofinterest.

In another embodiment, a method is provided for determining the presenceor absence of any four or more different complete fetal chromosomalaneuploidies in a maternal test sample comprising fetal and maternalnucleic acid molecules. The steps of the method comprise (a) obtainingsequence information for the fetal and maternal nucleic acids in thematernal test sample; (b) using the sequence information to identify anumber of sequence tags for each of any four or more chromosomes ofinterest selected from chromosomes 1-22, X and Y and to identify anumber of sequence tags for a normalizing chromosome sequence for eachof the any four or more chromosomes of interest; (c) using the number ofsequence tags identified for each of the any four or more chromosomes ofinterest and the number of sequence tags identified for each normalizingchromosome to calculate a single chromosome dose for each of the anyfour or more chromosomes of interest; and (d) comparing each of thesingle chromosome doses for each of the any four or more chromosomes ofinterest to a threshold value for each of the four or more chromosomesof interest, and thereby determining the presence or absence of any fouror more complete different fetal chromosomal aneuploidies in thematernal test sample, wherein the any four or more chromosomes ofinterest selected from chromosomes 1-22, X, and Y comprise at leasttwenty chromosomes selected from chromosomes 1-22, X, and Y, and whereinthe presence or absence of at least twenty different complete fetalchromosomal aneuploidies is determined. Step (a) can comprise sequencingat least a portion of the nucleic acid molecules of a test sample toobtain said sequence information for the fetal and maternal nucleic acidmolecules of the test sample. In some embodiments, step (c) comprisescalculating a single chromosome dose for each of the chromosomes ofinterest as the ratio of the number of sequence tags identified for eachof the chromosomes of interest and the number of sequence tagsidentified for the normalizing chromosome sequence for each of thechromosomes of interest. In some other embodiments, step (c) comprises(i) calculating a sequence tag density ratio for each of the chromosomesof interest, by relating the number of sequence tags identified for eachof the chromosomes of interest in step (b) to the length of each of thechromosomes of interest; (ii) calculating a sequence tag density ratiofor each normalizing chromosome sequence by relating the number ofsequence tags identified for the normalizing chromosome sequence in step(b) to the length of each normalizing chromosome; and (iii) using thesequence tag density ratios calculated in steps (i) and (ii) tocalculate a single chromosome dose for each of the chromosomes ofinterest, wherein the chromosome dose is calculated as the ratio of thesequence tag density ratio for each of the chromosomes of interest andthe sequence tag density ratio for the normalizing chromosome sequencefor each of the chromosomes of interest.

In another embodiment, a method is provided for determining the presenceor absence of any four or more different complete fetal chromosomalaneuploidies in a maternal test sample comprising fetal and maternalnucleic acid molecules. The steps of the method comprise (a) obtainingsequence information for the fetal and maternal nucleic acids in thematernal test sample; (b) using the sequence information to identify anumber of sequence tags for each of any four or more chromosomes ofinterest selected from chromosomes 1-22, X and Y and to identify anumber of sequence tags for a normalizing chromosome sequence for eachof the any four or more chromosomes of interest; (c) using the number ofsequence tags identified for each of the any four or more chromosomes ofinterest and the number of sequence tags identified for each normalizingchromosome sequence to calculate a single chromosome dose for each ofthe any four or more chromosomes of interest; and (d) comparing each ofthe single chromosome doses for each of the any four or more chromosomesof interest to a threshold value for each of the four or morechromosomes of interest, and thereby determining the presence or absenceof any four or more complete different fetal chromosomal aneuploidies inthe maternal test sample, wherein the any four or more chromosomes ofinterest selected from chromosomes 1-22, X, and Y is all of chromosomes1-22, X, and Y, and wherein the presence or absence of complete fetalchromosomal aneuploidies of all of chromosomes 1-22, X, and Y isdetermined. Step (a) can comprise sequencing at least a portion of thenucleic acid molecules of a test sample to obtain said sequenceinformation for the fetal and maternal nucleic acid molecules of thetest sample. In some embodiments, step (c) comprises calculating asingle chromosome dose for each of the chromosomes of interest as theratio of the number of sequence tags identified for each of thechromosomes of interest and the number of sequence tags identified forthe normalizing chromosome sequence for each of the chromosomes ofinterest. In some other embodiments, step (c) comprises (i) calculatinga sequence tag density ratio for each of the chromosomes of interest, byrelating the number of sequence tags identified for each of thechromosomes of interest in step (b) to the length of each of thechromosomes of interest; (ii) calculating a sequence tag density ratiofor each normalizing chromosome sequence by relating the number ofsequence tags identified for the normalizing chromosome sequence in step(b) to the length of each normalizing chromosome; and (iii) using thesequence tag density ratios calculated in steps (i) and (ii) tocalculate a single chromosome dose for each of the chromosomes ofinterest, wherein the chromosome dose is calculated as the ratio of thesequence tag density ratio for each of the chromosomes of interest andthe sequence tag density ratio for the normalizing chromosome sequencefor each of the chromosomes of interest.

In any of the embodiments above, the normalizing chromosome sequence maybe a single chromosome selected from chromosomes 1-22, X, and Y.Alternatively, the normalizing chromosome sequence may be a group ofchromosomes selected from chromosomes 1-22, X, and Y.

In another embodiment, a method is provided for determining the presenceor absence of any one or more different complete fetal chromosomalaneuploidies in a maternal test sample comprising fetal and maternalnucleic acids. The steps of the method comprise: (a) obtaining sequenceinformation for the fetal and maternal nucleic acids in the sample; (b)using the sequence information to identify a number of sequence tags foreach of any one or more chromosomes of interest selected fromchromosomes 1-22, X and Y and to identify a number of sequence tags fora normalizing segment sequence for each of any one or more chromosomesof interest; (c) using the number of sequence tags identified for eachof any one or more chromosomes of interest and the number of sequencetags identified for the normalizing segment sequence to calculate asingle chromosome dose for each of any one or more chromosomes ofinterest; and (d) comparing each of the single chromosome doses for eachof any one or more chromosomes of interest to a threshold value for eachof the one or more chromosomes of interest, and thereby determining thepresence or absence of one or more different complete fetal chromosomalaneuploidies in the sample. Step (a) can comprise sequencing at least aportion of the nucleic acid molecules of a test sample to obtain saidsequence information for the fetal and maternal nucleic acid moleculesof the test sample.

In some embodiments, step (c) comprises calculating a single chromosomedose for each of the chromosomes of interest as the ratio of the numberof sequence tags identified for each of the chromosomes of interest andthe number of sequence tags identified for the normalizing segmentsequence for each of the chromosomes of interest. In some otherembodiments, step (c) comprises (i) calculating a sequence tag densityratio for each of chromosomes of interest, by relating the number ofsequence tags identified for each chromosomes of interest in step (b) tothe length of each of the chromosomes of interest; (ii) calculating asequence tag density ratio for each normalizing segment sequence byrelating the number of sequence tags identified for the normalizingsegment sequence in step (b) to the length of each the normalizingchromosomes; and (iii) using the sequence tag density ratios calculatedin steps (i) and (ii) to calculate a single chromosome dose for each ofsaid chromosomes of interest, wherein said chromosome dose is calculatedas the ratio of the sequence tag density ratio for each of thechromosomes of interest and the sequence tag density ratio for thenormalizing segment sequence for each of the chromosomes of interest.

In another embodiment, a method is provided for determining the presenceor absence of any one or more different complete fetal chromosomalaneuploidies in a maternal test sample comprising fetal and maternalnucleic acids. The steps of the method comprise: (a) obtaining sequenceinformation for the fetal and maternal nucleic acids in the sample; (b)using the sequence information to identify a number of sequence tags foreach of any one or more chromosomes of interest selected fromchromosomes 1-22, X and Y and to identify a number of sequence tags fora normalizing segment sequence for each of any one or more chromosomesof interest; (c) using the number of sequence tags identified for eachof any one or more chromosomes of interest and the number of sequencetags identified for the normalizing segment sequence to calculate asingle chromosome dose for each of any one or more chromosomes ofinterest; and (d) comparing each of the single chromosome doses for eachof any one or more chromosomes of interest to a threshold value for eachof the one or more chromosomes of interest, and thereby determining thepresence or absence of one or more different complete fetal chromosomalaneuploidies in the sample, wherein the any one or more chromosomes ofinterest selected from chromosomes 1-22, X, and Y comprise at leasttwenty chromosomes selected from chromosomes 1-22, X, and Y, and whereinthe presence or absence of at least twenty different complete fetalchromosomal aneuploidies is determined. Step (a) can comprise sequencingat least a portion of the nucleic acid molecules of a test sample toobtain said sequence information for the fetal and maternal nucleic acidmolecules of the test sample. In some embodiments, step (c) comprisescalculating a single chromosome dose for each of the chromosomes ofinterest as the ratio of the number of sequence tags identified for eachof the chromosomes of interest and the number of sequence tagsidentified for the normalizing segment sequence for each of thechromosomes of interest. In some other embodiments, step (c) comprises(i) calculating a sequence tag density ratio for each of chromosomes ofinterest, by relating the number of sequence tags identified for eachchromosomes of interest in step (b) to the length of each of thechromosomes of interest; (ii) calculating a sequence tag density ratiofor each normalizing segment sequence by relating the number of sequencetags identified for the normalizing segment sequence in step (b) to thelength of each the normalizing chromosomes; and (iii) using the sequencetag density ratios calculated in steps (i) and (ii) to calculate asingle chromosome dose for each of said chromosomes of interest, whereinsaid chromosome dose is calculated as the ratio of the sequence tagdensity ratio for each of the chromosomes of interest and the sequencetag density ratio for the normalizing segment sequence for each of thechromosomes of interest.

In another embodiment, a method is provided for determining the presenceor absence of any one or more different complete fetal chromosomalaneuploidies in a maternal test sample comprising fetal and maternalnucleic acids. The steps of the method comprise: (a) obtaining sequenceinformation for the fetal and maternal nucleic acids in the sample; (b)using the sequence information to identify a number of sequence tags foreach of any one or more chromosomes of interest selected fromchromosomes 1-22, X and Y and to identify a number of sequence tags fora normalizing segment sequence for each of any one or more chromosomesof interest; (c) using the number of sequence tags identified for eachof any one or more chromosomes of interest and the number of sequencetags identified for the normalizing segment sequence to calculate asingle chromosome dose for each of any one or more chromosomes ofinterest; and (d) comparing each of the single chromosome doses for eachof any one or more chromosomes of interest to a threshold value for eachof the one or more chromosomes of interest, and thereby determining thepresence or absence of one or more different complete fetal chromosomalaneuploidies in the sample, wherein the any one or more chromosomes ofinterest selected from chromosomes 1-22, X, and Y is all of chromosomes1-22, X, and Y, and wherein the presence or absence of complete fetalchromosomal aneuploidies of all of chromosomes 1-22, X, and Y isdetermined. Step (a) can comprise sequencing at least a portion of thenucleic acid molecules of a test sample to obtain said sequenceinformation for the fetal and maternal nucleic acid molecules of thetest sample. In some embodiments, step (c) comprises calculating asingle chromosome dose for each of the chromosomes of interest as theratio of the number of sequence tags identified for each of thechromosomes of interest and the number of sequence tags identified forthe normalizing segment sequence for each of the chromosomes ofinterest. In some other embodiments, step (c) comprises (i) calculatinga sequence tag density ratio for each of chromosomes of interest, byrelating the number of sequence tags identified for each chromosomes ofinterest in step (b) to the length of each of the chromosomes ofinterest; (ii) calculating a sequence tag density ratio for eachnormalizing segment sequence by relating the number of sequence tagsidentified for the normalizing segment sequence in step (b) to thelength of each the normalizing chromosomes; and (iii) using the sequencetag density ratios calculated in steps (i) and (ii) to calculate asingle chromosome dose for each of said chromosomes of interest, whereinsaid chromosome dose is calculated as the ratio of the sequence tagdensity ratio for each of the chromosomes of interest and the sequencetag density ratio for the normalizing segment sequence for each of thechromosomes of interest.

In any one of the embodiments above, the different complete chromosomalaneuploidies are selected from complete chromosomal trisomies, completechromosomal monosomies and complete chromosomal polysomies. Thedifferent complete chromosomal aneuploidies are selected from completeaneuploidies of any one of chromosome 1-22, X, and Y. For example, thesaid different complete fetal chromosomal aneuploidies are selected fromtrisomy 2, trisomy 8, trisomy 9, trisomy 21, trisomy 13, trisomy 16,trisomy 18, trisomy 22, 47,XXY, 47,XXX, 47,XYY, and monosomy X.

In any one of the embodiments above, steps (a)-(d) are repeated for testsamples from different maternal subjects, and the method comprisesdetermining the presence or absence of any four or more differentcomplete fetal chromosomal aneuploidies in each of the test samples.

In any one of the embodiments above, the method can further comprisecalculating a normalized chromosome value (NCV), wherein the NCV relatesthe chromosome dose to the mean of the corresponding chromosome dose ina set of qualified samples as:

${NCV_{ij}} = \frac{x_{ij} - {\hat{\mu}}_{j}}{{\hat{\sigma}}_{j}}$

where {circumflex over (μ)}_(j) and {circumflex over (σ)}_(j) are theestimated mean and standard deviation, respectively, for the j-thchromosome dose in a set of qualified samples, and x_(ij) is theobserved j-th chromosome dose for test sample i.

In another embodiment, a method is provided for determining the presenceor absence of different partial fetal chromosomal aneuploidies in amaternal test sample comprising fetal and maternal nucleic acids. Thesteps of the method comprise: (a) obtaining sequence information for thefetal and maternal nucleic acids in the sample; (b) using the sequenceinformation to identify a number of sequence tags for each of any one ormore segments of any one or more chromosomes of interest selected fromchromosomes 1-22, X, and Y and to identify a number of sequence tags fora normalizing segment sequence for each of any one or more segments ofany one or more chromosomes of interest; (c) using the number ofsequence tags identified for each of any one or more segments of any oneor more chromosomes of interest and said number of sequence tagsidentified for the normalizing segment sequence to calculate a singlesegment dose for each of said any one or more segments of any one ormore chromosomes of interest; and (d) comparing each of the singlesegment doses for each of any one or more segments of any one or morechromosomes of interest to a threshold value for each of any one or morechromosomal segments of any one or more chromosome of interest, andthereby determining the presence or absence of one or more differentpartial fetal chromosomal aneuploidies in the sample. Step (a) cancomprise sequencing at least a portion of the nucleic acid molecules ofa test sample to obtain said sequence information for the fetal andmaternal nucleic acid molecules of the test sample.

In some embodiments, step (c) comprises calculating a single segmentdose for each of any one or more segments of any one or more chromosomesof interest as the ratio of the number of sequence tags identified foreach of any one or more segments of any one or more chromosomes ofinterest and the number of sequence tags identified for the normalizingsegment sequence for each of the any one or more segments of any one ormore chromosomes of interest. In some other embodiments, step (c)comprises (i) calculating a sequence tag density ratio for each ofsegment of interest, by relating the number of sequence tags identifiedfor each segment of interest in step (b) to the length of each of thesegment of interest; (ii) calculating a sequence tag density ratio foreach normalizing segment sequence by relating the number of sequencetags identified for the normalizing segment sequence in step (b) to thelength of each the normalizing segment sequence; and (iii) using thesequence tag density ratios calculated in steps (i) and (ii) tocalculate a single segment dose for each segment of interest, whereinthe segment dose is calculated as the ratio of the sequence tag densityratio for each of the segments of interest and the sequence tag densityratio for the normalizing segment sequence for each of the segments ofinterest. The method can further comprise calculating a normalizedsegment value (NSV), wherein the NSV relates said segment dose to themean of the corresponding segment dose in a set of qualified samples as:

${NSV_{ij}} = \frac{x_{ij} - {\hat{\mu}}_{j}}{{\hat{\sigma}}_{j}}$

where {circumflex over (μ)}_(j) and {circumflex over (σ)}_(j) are theestimated mean and standard deviation, respectively, for the j-thsegment dose in a set of qualified samples, and x_(ij) is the observedj-th segment dose for test sample i.

In embodiments of the method described whereby a chromosome dose or asegment dose is determined using a normalizing segment sequence, thenormalizing segment sequence may be a single segment of any one or moreof chromosomes 1-22, X, and Y. Alternatively, the normalizing segmentsequence may be a group of segments of any one or more of chromosomes1-22, X, and Y.

Steps (a)-(d) of the method for determining the presence or absence of apartial fetal chromosomal aneuploidy are repeated for test samples fromdifferent maternal subjects, and the method comprises determining thepresence or absence of different partial fetal chromosomal aneuploidiesin each of said samples. Partial fetal chromosomal aneuploidies that canbe determined according to the method include partial aneuploidies ofany segment of any chromosome. The partial aneuploidies can be selectedfrom partial duplications, partial multiplications, partial insertionsand partial deletions. Examples of partial aneuploidies that can bedetermined according to the method include partial monosomy ofchromosome 1, partial monosomy of chromosome 4, partial monosomy ofchromosome 5, partial monosomy of chromosome 7, partial monosomy ofchromosome 11, partial monosomy of chromosome 15, partial monosomy ofchromosome 17, partial monosomy of chromosome 18, and partial monosomyof chromosome 22.

In any one of the embodiments described above, the test sample may be amaternal sample selected from blood, plasma, serum, urine and salivasamples. In any one of the embodiments, the test sample is may be plasmasample. The nucleic acid molecules of the maternal sample are a mixtureof fetal and maternal cell-free DNA molecules. Sequencing of the nucleicacids can be performed using next generation sequencing (NGS). In someembodiments, sequencing is massively parallel sequencing usingsequencing-by-synthesis with reversible dye terminators. In otherembodiments, sequencing is sequencing-by-ligation. In yet otherembodiments, sequencing is single molecule sequencing. Optionally, anamplification step is performed prior to sequencing.

In another embodiment, a method is provided for determining the presenceor absence of any twenty or more different complete fetal chromosomalaneuploidies in a maternal plasma test sample comprising a mixture offetal and maternal cell-free DNA molecules. The steps of the methodcomprise: (a) sequencing at least a portion of the cell-free DNAmolecules to obtain sequence information for the fetal and maternalcell-free DNA molecules in the sample; (b) using the sequenceinformation to identify a number of sequence tags for each of any twentyor more chromosomes of interest selected from chromosomes 1-22, X, and Yand to identify a number of sequence tags for a normalizing chromosomefor each of said twenty or more chromosomes of interest; (c) using thenumber of sequence tags identified for each of the twenty or morechromosomes of interest and the number of sequence tags identified foreach normalizing chromosome to calculate a single chromosome dose foreach of the twenty or more chromosomes of interest; and (d) comparingeach of the single chromosome doses for each of the twenty or morechromosomes of interest to a threshold value for each of the twenty ormore chromosomes of interest, and thereby determining the presence orabsence of any twenty or more different complete fetal chromosomalaneuploidies in the sample.

In another embodiment, the invention provides a method for identifyingcopy number variation (CNV) of a sequence of interest e.g. a clinicallyrelevant sequence, in a test sample comprising the steps of: (a)obtaining a test sample and a plurality of qualified samples, said testsample comprising test nucleic acid molecules and said plurality ofqualified samples comprising qualified nucleic acid molecules; (b)obtaining sequence information for said fetal and maternal nucleic acidsin said sample; (c) based on said sequencing of said qualified nucleicacid molecules, calculating a qualified sequence dose for said qualifiedsequence of interest in each of said plurality of qualified samples,wherein said calculating a qualified sequence dose comprises determininga parameter for said qualified sequence of interest and at least onequalified normalizing sequence; (d) based on said qualified sequencedose, identifying at least one qualified normalizing sequence, whereinsaid at least one qualified normalizing sequence has the smallestvariability and/or the greatest differentiability in sequence dose insaid plurality of qualified samples; (e) based on said sequencing ofsaid nucleic acid molecules in said test sample, calculating a testsequence dose for said test sequence of interest, wherein saidcalculating a test sequence dose comprises determining a parameter forsaid test sequence of interest and at least one normalizing testsequence, and wherein said at least one normalizing test sequencecorresponds to said at least one qualified normalizing sequence; (f)comparing said test sequence dose to at least one threshold value; and(g) assessing said copy number variation of said sequence of interest insaid test sample based on the outcome of step (f). In one embodiment,the parameter for said qualified sequence of interest and at least onequalified normalizing sequence relates the number of sequence tagsmapped to said qualified sequence of interest to the number of tagsmapped to said qualified normalizing sequence, and wherein saidparameter for said test sequence of interest and at least onenormalizing test sequence relates the number of sequence tags mapped tosaid test sequence of interest to the number of tags mapped to saidnormalizing test sequence. In some embodiments, step (b) comprisessequencing at least a portion of the qualified and test nucleic acidmolecules, wherein sequencing comprises providing a plurality of mappedsequence tags for a test and a qualified sequence of interest, and forat least one test and at least one qualified normalizing sequence;sequencing at least a portion of said nucleic acid molecules of the testsample to obtain the sequence information for the fetal and maternalnucleic acid molecules of the test sample. In some embodiments, thesequencing step is performed using next generation sequencing method. Insome embodiments, the sequencing method may be a massively parallelsequencing method that uses sequencing-by-synthesis with reversible dyeterminators. In other embodiments, the sequencing method issequencing-by-ligation. In some embodiments, sequencing comprises anamplification. In other embodiments, sequencing is single moleculesequencing. The CNV of a sequence of interest is an aneuploidy, whichcan be a chromosomal or a partial aneuploidy. In some embodiments, thechromosomal aneuploidy is selected from trisomy 2, trisomy 8, trisomy 9,trisomy 16, trisomy 21, trisomy 13, trisomy 18, trisomy 22, 47,XXY,47,XXX, 47,XYY, and monosomy X. In other embodiments, the partialaneuploidy is a partial chromosomal deletion or a partial chromosomalinsertion. In some embodiments, the CNV identified by the method is achromosomal or partial aneuploidy associated with cancer. In someembodiments, the test and qualified sample are biological fluid samplese.g. plasma samples, obtained from a pregnant subject such as a pregnanthuman subject. In other embodiments, a test and qualified biologicalfluid samples e.g. plasma samples, are obtained from a subject that isknown or is suspected of having cancer.

Although the examples herein concern humans and the language isprimarily directed to human concerns, the concept of this invention isapplicable to genomes from any plant or animal.

INCORPORATION BY REFERENCE

All patents, patent applications, and other publications, including allsequences disclosed within these references, referred to herein areexpressly incorporated by reference, to the same extent as if eachindividual publication, patent or patent application was specificallyand individually indicated to be incorporated by reference. Alldocuments cited are, in relevant part, incorporated herein by reference.However, the citation of any document is not to be construed as anadmission that it is prior art with respect to the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity inthe appended claims. A better understanding of the features andadvantages of the present invention will be obtained by reference to thefollowing detailed description that sets forth illustrative embodiments,in which the principles of the invention are utilized, and theaccompanying drawings of which:

FIG. 1 is a flowchart of a method 100 for determining the presence orabsence of a copy number variation in a test sample comprising a mixtureof nucleic acids.

FIGS. 2A and 2B illustrate the distribution of the chromosome dose forchromosome 21 determined from sequencing cfDNA extracted from a set of48 blood samples obtained from human subjects each pregnant with a maleor a female fetus. Chromosome 21 doses for qualified i.e. normal forchromosome 21 (O), and trisomy 21 test samples are shown (Δ) forchromosomes 1-12 and X (FIG. 2A), and for chromosomes 1-22 and X (FIG.2B).

FIGS. 3A and 3B illustrate the distribution of the chromosome dose forchromosome 18 determined from sequencing cfDNA extracted from a set of48 blood samples obtained from human subjects each pregnant with a maleor a female fetus. Chromosome 18 doses for qualified i.e. normal forchromosome 18 (O), and trisomy 18 (Δ) test samples are shown forchromosomes 1-12 and X (FIG. 3A), and for chromosomes 1-22 and X (FIG.3B).

FIGS. 4A and 4B illustrate the distribution of the chromosome dose forchromosome 13 determined from sequencing cfDNA extracted from a set of48 blood samples obtained from human subjects each pregnant with a maleor a female fetus. Chromosome 13 doses for qualified i.e. normal forchromosome 13 (O), and trisomy 13 (Δ) test samples are shown forchromosomes 1-12 and X (FIG. 4A), and for chromosomes 1-22 and X (FIG.4B).

FIGS. 5A and 5B illustrate the distribution of the chromosome doses forchromosome X determined from sequencing cfDNA extracted from a set of 48test blood samples obtained from human subjects each pregnant with amale or a female fetus. Chromosome X doses for males (46,XY; (O)),females (46,XX; (Δ)); monosomy X (45,X; (+)), and complex karyotypes(Cplx (X)) samples are shown for chromosomes 1-12 and X (FIG. 5A), andfor chromosomes 1-22 and X (FIG. 5B).

FIGS. 6A and 6B illustrate the distribution of the chromosome doses forchromosome Y determined from sequencing cfDNA extracted from a set of 48test blood samples obtained from human subjects each pregnant with amale or a female fetus. Chromosome Y doses for males (46,XY; (Δ)),females (46,XX; (O)); monosomy X (45,X; (+)), and complex karyotypes(Cplx (X)) samples are shown for chromosomes 1-12 (FIG. 6A), and forchromosomes 1-22 (FIG. 6B).

FIG. 7 shows the coefficient of variation (CV) for chromosomes 21 (▪),18 (●) and 13 (▴) that was determined from the doses shown in FIGS. 2,3, and 4, respectively.

FIG. 8 shows the coefficient of variation (CV) for chromosomes X (▪) andY (●) that was determined from the doses shown in FIGS. 5 and 6,respectively.

FIG. 9 shows the cumulative distribution of GC fraction by humanchromosome. The vertical axis represents the frequency of the chromosomewith GC content below the value shown on the horizontal axis.

FIG. 10 illustrates the sequence doses (Y-axis) for a segment ofchromosome 11 (81000082-103000103 bp) determined from sequencing cfDNAextracted from a set of 7 qualified samples (O) obtained and 1 testsample (♦) from pregnant human subjects. A sample from a subjectcarrying a fetus with a partial aneuploidy of chromosome 11 (♦) wasidentified.

FIGS. 11A-11E illustrate the distribution of normalized chromosome dosesfor chromosome 21 (FIG. 11A), chromosome 18 (FIG. 11B), chromosome 13(FIG. 11C), chromosome X FIG. 11 (FIG. 11D) and chromosome Y (FIG. 11E)relative to the standard deviation of the mean (Y-axis) for thecorresponding chromosomes in the unaffected samples.

FIG. 12 shows normalized chromosome values for chromosomes 21 (O), 18(Δ), and 13 (□) determined in samples from training set 1 usingnormalizing chromosomes as described in Example 6.

FIG. 13 shows normalized chromosome values for chromosomes 21 (O), 18(Δ), and 13 (□) determined in samples from test set 1 using normalizingchromosomes as described in Example 6.

FIG. 14 shows normalized chromosome values for chromosomes 21 (O) and 18(Δ) determined in samples from test set 1 using the normalizing methodof Chiu et al. (normalizes the number of sequence tags identified forthe chromosome of interest with the number of sequence tags obtained forthe remaining chromosomes in the sample; see elsewhere herein Example7).

FIG. 15 shows normalized chromosome values for chromosomes 21 (O), 18(Δ), and 13 (□) determined in samples from training set 1 usingsystematically determined normalizing chromosomes (as described inExample 7).

FIG. 16 shows normalized chromosome values for chromosomes 21 (O), 18(Δ), and 13 (□) determined in samples from test set 1 usingsystematically determined normalizing chromosomes (as described inExample 7).

FIG. 17 shows normalized chromosome values for chromosome 9 (O)determined in samples from test set 1 using systematically determinednormalizing chromosomes (as described in Example 7).

FIGS. 18A and 18B show normalized chromosome values for chromosomes X(X-axis) and Y (Y-axis). The arrows point to the 5 (FIG. 18A) and 3(FIG. 18B) monosomy X samples that were identified in the training andtest sets, respectively, as described in Example 7.

FIG. 19 shows normalized chromosome values for chromosomes 1-22determined in samples from test set 1 using systematically determinednormalizing chromosomes (as described in Example 7).

DETAILED DESCRIPTION OF THE INVENTION

The invention provides a method for determining copy number variations(CNV) of a sequence of interest in a test sample that comprises amixture of nucleic acids that are known or are suspected to differ inthe amount of one or more sequence of interest. Sequences of interestinclude genomic sequences ranging from kilobases (kb) to megabases (Mb)to entire chromosomes that are known or are suspected to be associatedwith a genetic or a disease condition. Examples of sequences of interestinclude chromosomes associated with well known aneuploidies e.g. trisomy21, and segments of chromosomes that are multiplied in diseases such ascancer e.g. partial trisomy 8 in acute myeloid leukemia. CNV that can bedetermined according to the present method include monosomies andtrisomies of any one or more of autosomes 1-22, and of sex chromosomes Xand Y e.g. 45,X, 47,XXX, 47,XXY and 47,XYY, other chromosomal polysomiesi.e. tetrasomy and pentasomies including but not limited to XXXX, XXXXX,XXXXY and XYYYY, and deletions and/or duplications of segments of anyone or more of the chromosomes.

The method comprises a statistical approach that accounts for accruedvariability stemming from process-related, interchromosomal (intra-run),and inter-sequencing (inter-run) variability. The method is applicableto determining CNV of any fetal aneuploidy, and CNVs known or suspectedto be associated with a variety of medical conditions.

Unless otherwise indicated, the practice of the present inventioninvolves conventional techniques commonly used in molecular biology,microbiology, protein purification, protein engineering, protein and DNAsequencing, and recombinant DNA fields, which are within the skill ofthe art. Such techniques are known to those of skill in the art and aredescribed in numerous texts and reference works (See e.g., Sambrook etal., “Molecular Cloning: A Laboratory Manual”, Third Edition (ColdSpring Harbor), [2001]); and Ausubel et al., “Current Protocols inMolecular Biology” [1987]).

Numeric ranges are inclusive of the numbers defining the range. It isintended that every maximum numerical limitation given throughout thisspecification includes every lower numerical limitation, as if suchlower numerical limitations were expressly written herein. Every minimumnumerical limitation given throughout this specification will includeevery higher numerical limitation, as if such higher numericallimitations were expressly written herein. Every numerical range giventhroughout this specification will include every narrower numericalrange that falls within such broader numerical range, as if suchnarrower numerical ranges were all expressly written herein.

The headings provided herein are not limitations of the various aspectsor embodiments of the invention which can be had by reference to theSpecification as a whole. Accordingly, as indicated above, the termsdefined immediately below are more fully defined by reference to thespecification as a whole.

Unless defined otherwise herein, all technical and scientific terms usedherein have the same meaning as commonly understood by one of ordinaryskill in the art to which this invention belongs. Various scientificdictionaries that include the terms included herein are well known andavailable to those in the art. Although any methods and materialssimilar or equivalent to those described herein find use in the practiceor testing of the present invention, some preferred methods andmaterials are described. Accordingly, the terms defined immediatelybelow are more fully described by reference to the Specification as awhole. It is to be understood that this invention is not limited to theparticular methodology, protocols, and reagents described, as these mayvary, depending upon the context they are used by those of skill in theart.

Definitions

As used herein, the singular terms “a”, “an,” and “the” include theplural reference unless the context clearly indicates otherwise. Unlessotherwise indicated, nucleic acids are written left to right in 5′ to 3′orientation and amino acid sequences are written left to right in aminoto carboxy orientation, respectively.

The term “assessing” herein refers to characterizing the status of achromosomal aneuploidy by one of three types of calls: “normal”,“affected”, and “no-call”. For example, in the presence of trisomy the“normal” call is determined by the value of a parameter e.g. a testchromosome dose that is below a user-defined threshold of reliability,the “affected” call is determined by a parameter e.g. a test chromosomedose, that is above a user-defined threshold of reliability, and the“no-call” result is determined by a parameter e.g. a test chromosomedose, that lies between the user-defined thresholds of reliability formaking a “normal” or an “affected” call.

The term “copy number variation” herein refers to variation in thenumber of copies of a nucleic acid sequence that is 1 kb or largerpresent in a test sample in comparison with the copy number of thenucleic acid sequence present in a qualified sample. A “copy numbervariant” refers to the 1 kb or larger sequence of nucleic acid in whichcopy-number differences are found by comparison of a sequence ofinterest in test sample with that present in a qualified sample. Copynumber variants/variations include deletions, including microdeletions,insertions, including microinsertions, duplications, multiplications,inversions, translocations and complex multi-site variants. CNVencompass chromosomal aneuploidies and partial aneuploidies.

The term “aneuploidy” herein refers to an imbalance of genetic materialcaused by a loss or gain of a whole chromosome, or part of a chromosome.

The terms “chromosomal aneuploidy” and “complete chromosomal aneuploidy”herein refer to an imbalance of genetic material caused by a loss orgain of a whole chromosome, and includes germline aneuploidy and mosaicaneuploidy.

The terms “partial aneuploidy” and “partial chromosomal aneuploidy”herein refer to an imbalance of genetic material caused by a loss orgain of part of a chromosome e.g. partial monosomy and partial trisomy,and encompasses imbalances resulting from translocations, deletions andinsertions.

The term “aneuploid sample” herein refers to a sample indicative of asubject whose chromosomal content is not euploid, i.e. the sample isindicative of a subject with an abnormal copy number of chromosomes.

The term “aneuploid chromosome” herein refers to a chromosome that isknown or determined to be present in a sample in an abnormal copynumber.

The term “plurality” is used herein in reference to a number of nucleicacid molecules or sequence tags that is sufficient to identifysignificant differences in copy number variations (e.g. chromosomedoses) in test samples and qualified samples using in the methods of theinvention. In some embodiments, at least about 3×10⁶ sequence tags, atleast about 5×10⁶ sequence tags, at least about 8×10⁶ sequence tags, atleast about 10×10⁶ sequence tags, at least about 15×10⁶ sequence tags,at least about 20×10⁶ sequence tags, at least about 30×10⁶ sequencetags, at least about 40×10⁶ sequence tags, or at least about 50×10⁶sequence tags comprising between 20 and 40 bp reads are obtained foreach test sample.

The terms “polynucleotide”, “nucleic acid” and “nucleic acid molecules”are used interchangeably and refer to a covalently linked sequence ofnucleotides (i.e., ribonucleotides for RNA and deoxyribonucleotides forDNA) in which the 3′ position of the pentose of one nucleotide is joinedby a phosphodiester group to the 5′ position of the pentose of the next,include sequences of any form of nucleic acid, including, but notlimited to RNA, DNA and cfDNA molecules. The term “polynucleotide”includes, without limitation, single- and double-strandedpolynucleotide.

The term “portion” is used herein in reference to the amount of sequenceinformation of fetal and maternal nucleic acid molecules in a biologicalsample that in sum amount to less than the sequence information of <1human genome.

The term “test sample” herein refers to a sample comprising a mixture ofnucleic acids comprising at least one nucleic acid sequence whose copynumber is suspected of having undergone variation. Nucleic acids presentin a test sample are referred to as “test nucleic acids”.

The term “qualified sample” herein refers to a sample comprising amixture of nucleic acids that are present in a known copy number towhich the nucleic acids in a test sample are compared, and it is asample that is normal i.e. not aneuploid, for the sequence of intereste.g. a qualified sample used for identifying a normalizing chromosomefor chromosome 21 is a sample that is not a trisomy 21 sample.

The term “training set” herein refers to a set of samples that cancomprise affected and unaffected samples. The unaffected samples in atraining set are used as the qualified samples to identify normalizingsequences, e.g. normalizing chromosomes, and the chromosome doses ofunaffected samples are used to set the thresholds for each of thesequences, e.g. chromosomes, of interest. The affected samples in atraining set can be used to verify that affected test samples can beeasily differentiated from unaffected samples.

The term “qualified nucleic acid” is used interchangeably with“qualified sequence” is a sequence against which the amount of a testsequence or test nucleic acid is compared. A qualified sequence is onepresent in a biological sample preferably at a known representation i.e.the amount of a qualified sequence is known. A “qualified sequence ofinterest” is a qualified sequence for which the amount is known in aqualified sample, and is a sequence that is associated with a differencein sequence representation in an individual with a medical condition.

The term “sequence of interest” herein refers to a nucleic acid sequencethat is associated with a difference in sequence representation inhealthy versus diseased individuals. A sequence of interest can be asequence on a chromosome that is misrepresented i.e. over- orunder-represented, in a disease or genetic condition. A sequence ofinterest may also be a portion of a chromosome i.e. chromosome segment,or a chromosome. For example, a sequence of interest can be a chromosomethat is over-represented in an aneuploidy condition, or a gene encodinga tumor-suppressor that is under-represented in a cancer. Sequences ofinterest include sequences that are over- or under-represented in thetotal population, or a subpopulation of cells of a subject. A “qualifiedsequence of interest” is a sequence of interest in a qualified sample. A“test sequence of interest” is a sequence of interest in a test sample.

The term “normalizing sequence” herein refers to a sequence thatdisplays a variability in the number of sequence tags that are mapped toit among samples and sequencing runs that best approximates that of thesequence of interest for which it is used as a normalizing parameter,and that can best differentiate an affected sample from one or moreunaffected samples. A “normalizing chromosome” or “normalizingchromosome sequence” is an example of a “normalizing sequence”. A“normalizing chromosome sequence” can be composed of a single chromosomeor of a group of chromosomes. A “normalizing segment” is another exampleof a “normalizing sequence”. A “normalizing segment sequence” can becomposed of a single segment of a chromosome or it can be composed oftwo or more segments of the same or of different chromosomes.

The term “differentiability” herein refers to the characteristic of anormalizing chromosome that enables to distinguish one or moreunaffected i.e. normal, samples from one or more affected i.e.aneuploid, samples.

The term “sequence dose” herein refers to a parameter that relates thesequence tag density of a sequence of interest to the tag density of anormalizing sequence. A “test sequence dose” is a parameter that relatesthe sequence tag density of a sequence of interest e.g. chromosome 21,to that of a normalizing sequence e.g. chromosome 9, determined in atest sample. Similarly, a “qualified sequence dose” is a parameter thatrelates the sequence tag density of a sequence of interest to that of anormalizing sequence determined in a qualified sample.

The term “sequence tag density” herein refers to the number of sequencereads that are mapped to a reference genome sequence e.g. the sequencetag density for chromosome 21 is the number of sequence reads generatedby the sequencing method that are mapped to chromosome 21 of thereference genome. The term “sequence tag density ratio” herein refers tothe ratio of the number of sequence tags that are mapped to a chromosomeof the reference genome e.g. chromosome 21, to the length of thereference genome chromosome 21.

The term “Next Generation Sequencing (NGS)” herein refers to sequencingmethods that allow for massively parallel sequencing of clonallyamplified and of single nucleic acid molecules. Non-limiting examples ofNGS include sequencing-by-synthesis using reversible dye terminators,and sequencing-by-ligation.

The term “parameter” herein refers to a numerical value thatcharacterizes a quantitative data set and/or a numerical relationshipbetween quantitative data sets. For example, a ratio (or function of aratio) between the number of sequence tags mapped to a chromosome andthe length of the chromosome to which the tags are mapped, is aparameter.

The terms “threshold value” and “qualified threshold value” herein referto any number that is calculated using a qualifying data set and servesas a limit of diagnosis of a copy number variation e.g. an aneuploidy,in an organism. If a threshold is exceeded by results obtained frompracticing the invention, a subject can be diagnosed with a copy numbervariation e.g. trisomy 21. Appropriate threshold values for the methodsdescribed herein can be identified by analyzing normalizing values (e.g.chromosome doses, NCVs or NSVs) calculated for a training set ofsamples. Threshold values can be identified using qualified (i.e.unaffected) samples in a training set which comprises both qualified(i.e. unaffected) samples and affected samples. The samples in thetraining set known to have chromosomal aneuploidies (i.e. the affectedsamples) can be used to confirm that the chosen thresholds are useful indifferentiating affected from unaffected samples in a test set (see theExamples herein). The choice of a threshold is dependent on the level ofconfidence that the user wishes to have to make the classification. Insome embodiments, the training set used to identify appropriatethreshold values comprises at least 10, at least 20, at least 30, atleast 40, at least 50, at least 60, at least 70, at least 80, at least90, at least 100, at least 200, at least 300, at least 400, at least500, at least 600, at least 700, at least 800, at least 900, at least1000, at least 2000, at least 3000, at least 4000, or more qualifiedsamples. It may advantageous to use larger sets of qualified samples toimprove the diagnostic utility of the threshold values.

The term “normalizing value” herein refers to a numerical value thatrelates the number of sequence tags identified for the sequence (e.g.chromosome or chromosome segment) of interest to the number of sequencetags identified for the normalizing sequence (e.g. normalizingchromosome or normalizing chromosome segment). For example, a“normalizing value” can be a chromosome dose as described elsewhereherein, or it can be an NCV (Normalized Chromosome Value) as describedelsewhere herein, or it can be an NSV (Normalized Segment Value) asdescribed elsewhere herein.

The term “read” refers to a DNA sequence of sufficient length (e.g., atleast about 30 bp) that can be used to identify a larger sequence orregion, e.g. that can be aligned and specifically assigned to achromosome or genomic region or gene.

The term “sequence tag” is herein used interchangeably with the term“mapped sequence tag” to refer to a sequence read that has beenspecifically assigned i.e. mapped, to a larger sequence e.g. a referencegenome, by alignment. Mapped sequence tags are uniquely mapped to areference genome i.e. they are assigned to a single location to thereference genome. Tags that can be mapped to more than one location on areference genome i.e. tags that do not map uniquely, are not included inthe analysis.

As used herein, the terms “aligned”, “alignment”, or “aligning” refer toone or more sequences that are identified as a match in terms of theorder of their nucleic acid molecules to a known sequence from areference genome. Such alignment can be done manually or by a computeralgorithm, examples including the Efficient Local Alignment ofNucleotide Data (ELAND) computer program distributed as part of theIllumina Genomics Analysis pipeline. The matching of a sequence read inaligning can be a 100% sequence match or less than 100% (non-perfectmatch).

As used herein, the term “reference genome” refers to any particularknown genome sequence, whether partial or complete, of any organism orvirus which may be used to reference identified sequences from asubject. For example, a reference genome used for human subjects as wellas many other organisms is found on the worldwide web at the NationalCenter for Biotechnology Information at ncbi.nlm.nih.gov. A “genome”refers to the complete genetic information of an organism or virus,expressed in nucleic acid sequences.

The term “clinically-relevant sequence” herein refers to a nucleic acidsequence that is known or is suspected to be associated or implicatedwith a genetic or disease condition. Determining the absence or presenceof a clinically-relevant sequence can be useful in determining adiagnosis or confirming a diagnosis of a medical condition, or providinga prognosis for the development of a disease.

The term “derived” when used in the context of a nucleic acid or amixture of nucleic acids, herein refers to the means whereby the nucleicacid(s) are obtained from the source from which they originate. Forexample, in one embodiment, a mixture of nucleic acids that is derivedfrom two different genomes means that the nucleic acids e.g. cfDNA, werenaturally released by cells through naturally occurring processes suchas necrosis or apoptosis. In another embodiment, a mixture of nucleicacids that is derived from two different genomes means that the nucleicacids were extracted from two different types of cells from a subject.

The term “mixed sample” herein refers to a sample containing a mixtureof nucleic acids, which are derived from different genomes.

The term “maternal sample” herein refers to a biological sample obtainedfrom a pregnant subject e.g. a woman.

The term “biological fluid” herein refers to a liquid taken from abiological source and includes, for example, blood, serum, plasma,sputum, lavage fluid, cerebrospinal fluid, urine, semen, sweat, tears,saliva, and the like. As used herein, the terms “blood,” “plasma” and“serum” expressly encompass fractions or processed portions thereof.Similarly, where a sample is taken from a biopsy, swab, smear, etc., the“sample” expressly encompasses a processed fraction or portion derivedfrom the biopsy, swab, smear, etc.

The terms “maternal nucleic acids” and “fetal nucleic acids” hereinrefer to the nucleic acids of a pregnant female subject and the nucleicacids of the fetus being carried by the pregnant female, respectively.

As used herein, the term “corresponding to” refers to a nucleic acidsequence e.g. a gene or a chromosome, that is present in the genome ofdifferent subjects, and which does not necessarily have the samesequence in all genomes, but serves to provide the identity rather thanthe genetic information of a sequence of interest e.g. a gene orchromosome.

As used herein, the term “substantially cell free” encompassespreparations of the desired sample from which components that arenormally associated with it are removed. For example, a plasma sample isrendered essentially cell free by removing blood cells e.g. red cells,which are normally associated with it. In some embodiments,substantially free samples are processed to remove cells that wouldotherwise contribute to the desired genetic material that is to betested for a CNV.

As used herein, the term “fetal fraction” refers to the fraction offetal nucleic acids present in a sample comprising fetal and maternalnucleic acid.

As used herein the term “chromosome” refers to the heredity-bearing genecarrier of a living cell which is derived from chromatin and whichcomprises DNA and protein components (especially histones). Theconventional internationally recognized individual human genomechromosome numbering system is employed herein.

As used herein, the term “polynucleotide length” refers to the absolutenumber of nucleic acid molecules (nucleotides) in a sequence or in aregion of a reference genome. The term “chromosome length” refers to theknown length of the chromosome given in base pairs e.g. provided in theNCBI36/hg18 assembly of the human chromosome found on the world wide webat genome.ucsc.edu/cgi-bin/hgTracks?hgsid=167155613&chromInfoPage=

The term “subject” herein refers to a human subject as well as anon-human subject such as a mammal, an invertebrate, a vertebrate, afungus, a yeast, a bacteria, and a virus. Although the examples hereinconcern humans and the language is primarily directed to human concerns,the concept of this invention is applicable to genomes from any plant oranimal, and is useful in the fields of veterinary medicine, animalsciences, research laboratories and such.

The term “condition” herein refers to “medical condition” as a broadterm that includes all diseases and disorders, but can include[injuries] and normal health situations, such as pregnancy, that mightaffect a person's health, benefit from medical assistance, or haveimplications for medical treatments.

The term “complete” is used herein in reference to a chromosomalaneuploidy to refer to a gain or loss of an entire chromosome.

The term “partial” when used in reference to a chromosomal aneuploidyherein refers to a gain or loss of a portion of a chromosome.

The term “mosaic” herein refers to denote the presence of twopopulations of cells with different karyotypes in one individual who hasdeveloped from a single fertilized egg. Mosaicism may result from amutation during development which is propagated to only a subset of theadult cells.

The term “non-mosaic” herein refers to an organism e.g. a human fetus,composed of cell of one karyotypes.

The term “using a chromosome” when used in reference to determining achromosome dose, herein refers to using the sequence informationobtained for a chromosome i.e. the number of sequence tags obtained fora chromosome.

The term “sensitivity” is used herein is equal to the number of truepositives divided by the sum of true positives and false negatives.

The term “specificity” is used herein is equal to the number of truenegatives divided by the sum of true negatives and false positives.

The term “patient sample” herein refers to a biological sample obtainedfrom a patient i.e. a recipient of medical attention, care or treatment.The patient sample can be any of the samples described herein.Preferably, the patient sample is obtained by non-invasive procedurese.g. peripheral blood sample or a stool sample.

The term “hypodiploid” herein refers to a chromosome number that is oneor more lower than the normal haploid number of chromosomescharacteristic for the species.

DESCRIPTION

The invention provides a method for determining copy number variations(CNV) of different sequences of interest in a test sample that comprisesa mixture of nucleic acids derived from two different genomes, and whichare known or are suspected to differ in the amount of one or moresequence of interest. Copy number variations determined by the method ofthe invention include gains or losses of entire chromosomes, alterationsinvolving very large chromosomal segments that are microscopicallyvisible, and an abundance of sub-microscopic copy number variation ofDNA segments ranging from kilobases (kb) to megabases (Mb) in size. Themethod comprises a statistical approach that accounts for accruedvariability stemming from process-related, interchromosomal andinter-sequencing variability. The method is applicable to determiningCNV of any fetal aneuploidy, and CNVs known or suspected to beassociated with a variety of medical conditions. CNV that can bedetermined according to the present method include trisomies andmonosomies of any one or more of chromosomes 1-22, X and Y, otherchromosomal polysomies, and deletions and/or duplications of segments ofany one or more of the chromosomes, which can be detected by sequencingonly once the nucleic acids of a test sample. Any aneuploidy can bedetermined from sequencing information that is obtained by sequencingonly once the nucleic acids of a test sample.

CNV in the human genome significantly influence human diversity andpredisposition to disease (Redon et al., Nature 23:444-454 [2006],Shaikh et al. Genome Res 19:1682-1690 [2009]). CNVs have been known tocontribute to genetic disease through different mechanisms, resulting ineither imbalance of gene dosage or gene disruption in most cases. Inaddition to their direct correlation with genetic disorders, CNVs areknown to mediate phenotypic changes that can be deleterious. Recently,several studies have reported an increased burden of rare or de novoCNVs in complex disorders such as Autism, ADHD, and schizophrenia ascompared to normal controls, highlighting the potential pathogenicity ofrare or unique CNVs (Sebat et al., 316:445-449 [2007]; Walsh et al.,Science 320:539-543 [2008]). CNV arise from genomic rearrangements,primarily owing to deletion, duplication, insertion, and unbalancedtranslocation events.

The method described herein employs next generation sequencingtechnology (NGS) in which clonally amplified DNA templates or single DNAmolecules are sequenced in a massively parallel fashion within a flowcell (e.g. as described in Volkerding et al. Clin Chem 55:641-658[2009]; Metzker M Nature Rev 11:31-46 [2010]). In addition tohigh-throughput sequence information, NGS provides quantitativeinformation, in that each sequence read is a countable “sequence tag”representing an individual clonal DNA template or a single DNA molecule.The sequencing technologies of NGS include pyrosequencing,sequencing-by-synthesis with reversible dye terminators, sequencing byoligonucleotide probe ligation and ion semiconductor sequencing. DNAfrom individual samples can be sequenced individually (i.e. singleplexsequencing) or DNA from multiple samples can be pooled and sequenced asindexed genomic molecules (i.e. multiplex sequencing) on a singlesequencing run, to generate up to several hundred million reads of DNAsequences. Examples of sequencing technologies that can be used toobtain the sequence information according to the present method aredescribed below.

Sequencing Methods

Some of the sequencing technologies are available commercially, such asthe sequencing-by-hybridization platform from Affymetrix Inc.(Sunnyvale, Calif.) and the sequencing-by-synthesis platforms from 454Life Sciences (Bradford, Conn.), Illumina/Solexa (Hayward, Calif.) andHelicos Biosciences (Cambridge, Mass.), and the sequencing-by-ligationplatform from Applied Biosystems (Foster City, Calif.), as describedbelow. In addition to the single molecule sequencing performed usingsequencing-by-synthesis of Helicos Biosciences, other single moleculesequencing technologies include the SMRT™ technology of PacificBiosciences, the Ion Torrent™ technology, and nanopore sequencing beingdeveloped for example, by Oxford Nanopore Technologies. While theautomated Sanger method is considered as a ‘first generation’technology, Sanger sequencing including the automated Sanger sequencing,can also be employed by the method of the invention. Additionalsequencing methods nucleic acid imaging technologies e.g. atomic forcemicroscopy (AFM) or transmission electron microscopy (TEM). Exemplarysequencing technologies are described below.

In one embodiment, the present method comprises obtaining sequenceinformation for the nucleic acids in a test sample e.g. cfDNA in amaternal sample, using single molecule sequencing technology of theHelicos True Single Molecule Sequencing (tSMS) technology (e.g. asdescribed in Harris T. D. et al., Science 320:106-109 [2008]). In thetSMS technique, a DNA sample is cleaved into strands of approximately100 to 200 nucleotides, and a polyA sequence is added to the 3′ end ofeach DNA strand. Each strand is labeled by the addition of afluorescently labeled adenosine nucleotide. The DNA strands are thenhybridized to a flow cell, which contains millions of oligo-T capturesites that are immobilized to the flow cell surface. The templates canbe at a density of about 100 million templates/cm². The flow cell isthen loaded into an instrument, e.g., HeliScope™ sequencer, and a laserilluminates the surface of the flow cell, revealing the position of eachtemplate. A CCD camera can map the position of the templates on the flowcell surface. The template fluorescent label is then cleaved and washedaway. The sequencing reaction begins by introducing a DNA polymerase anda fluorescently labeled nucleotide. The oligo-T nucleic acid serves as aprimer. The polymerase incorporates the labeled nucleotides to theprimer in a template directed manner. The polymerase and unincorporatednucleotides are removed. The templates that have directed incorporationof the fluorescently labeled nucleotide are discerned by imaging theflow cell surface. After imaging, a cleavage step removes thefluorescent label, and the process is repeated with other fluorescentlylabeled nucleotides until the desired read length is achieved. Sequenceinformation is collected with each nucleotide addition step. Wholegenome sequencing by single molecule sequencing technologies excludesPCR-based amplification in the preparation of the sequencing libraries,and the directness of sample preparation allows for direct measurementof the sample, rather than measurement of copies of that sample.

In another embodiment, the present method comprises obtaining sequenceinformation for the nucleic acids in the test sample e.g. cfDNA in amaternal test sample, using the 454 sequencing (Roche) (e.g. asdescribed in Margulies, M. et al. Nature 437:376-380 [2005]). 454sequencing involves two steps. In the first step, DNA is sheared intofragments of approximately 300-800 base pairs, and the fragments areblunt-ended. Oligonucleotide adaptors are then ligated to the ends ofthe fragments. The adaptors serve as primers for amplification andsequencing of the fragments. The fragments can be attached to DNAcapture beads, e.g., streptavidin-coated beads using, e.g., Adaptor B,which contains 5′-biotin tag. The fragments attached to the beads arePCR amplified within droplets of an oil-water emulsion. The result ismultiple copies of clonally amplified DNA fragments on each bead. In thesecond step, the beads are captured in wells (pico-liter sized).Pyrosequencing is performed on each DNA fragment in parallel. Additionof one or more nucleotides generates a light signal that is recorded bya CCD camera in a sequencing instrument. The signal strength isproportional to the number of nucleotides incorporated. Pyrosequencingmakes use of pyrophosphate (PPi) which is released upon nucleotideaddition. PPi is converted to ATP by ATP sulfurylase in the presence ofadenosine 5′ phosphosulfate. Luciferase uses ATP to convert luciferin tooxyluciferin, and this reaction generates light that is measured andanalyzed.

In another embodiment, the present method comprises obtaining sequenceinformation for the nucleic acids in the test sample e.g. cfDNA in amaternal test sample, using the SOLiD™ technology (Applied Biosystems).In SOLiD™ sequencing-by-ligation, genomic DNA is sheared into fragments,and adaptors are attached to the 5′ and 3′ ends of the fragments togenerate a fragment library. Alternatively, internal adaptors can beintroduced by ligating adaptors to the 5′ and 3′ ends of the fragments,circularizing the fragments, digesting the circularized fragment togenerate an internal adaptor, and attaching adaptors to the 5′ and 3′ends of the resulting fragments to generate a mate-paired library. Next,clonal bead populations are prepared in microreactors containing beads,primers, template, and PCR components. Following PCR, the templates aredenatured and beads are enriched to separate the beads with extendedtemplates. Templates on the selected beads are subjected to a 3′modification that permits bonding to a glass slide. The sequence can bedetermined by sequential hybridization and ligation of partially randomoligonucleotides with a central determined base (or pair of bases) thatis identified by a specific fluorophore. After a color is recorded, theligated oligonucleotide is cleaved and removed and the process is thenrepeated.

In another embodiment, the present method comprises obtaining sequenceinformation for the nucleic acids in the test sample e.g. cfDNA in amaternal test sample, using the single molecule, real-time (SMRT™)sequencing technology of Pacific Biosciences. In SMRT sequencing, thecontinuous incorporation of dye-labeled nucleotides is imaged during DNAsynthesis. Single DNA polymerase molecules are attached to the bottomsurface of individual zero-mode wavelength detectors (ZMW detectors)that obtain sequence information while phospholinked nucleotides arebeing incorporated into the growing primer strand. A ZMW is aconfinement structure which enables observation of incorporation of asingle nucleotide by DNA polymerase against the background offluorescent nucleotides that rapidly diffuse in an out of the ZMW (inmicroseconds). It takes several milliseconds to incorporate a nucleotideinto a growing strand. During this time, the fluorescent label isexcited and produces a fluorescent signal, and the fluorescent tag iscleaved off. Measurement of the corresponding fluorescence of the dyeindicates which base was incorporated. The process is repeated.

In another embodiment, the present method comprises obtaining sequenceinformation for the nucleic acids in the test sample e.g. cfDNA in amaternal test sample, using nanopore sequencing (e.g. as described inSoni GV and Meller A. Clin Chem 53: 1996-2001 [2007]). Nanoporesequencing DNA analysis techniques are being industrially developed by anumber of companies, including Oxford Nanopore Technologies (Oxford,United Kingdom). Nanopore sequencing is a single-molecule sequencingtechnology whereby a single molecule of DNA is sequenced directly as itpasses through a nanopore. A nanopore is a small hole, of the order of 1nanometer in diameter. Immersion of a nanopore in a conducting fluid andapplication of a potential (voltage) across it results in a slightelectrical current due to conduction of ions through the nanopore. Theamount of current which flows is sensitive to the size and shape of thenanopore. As a DNA molecule passes through a nanopore, each nucleotideon the DNA molecule obstructs the nanopore to a different degree,changing the magnitude of the current through the nanopore in differentdegrees. Thus, this change in the current as the DNA molecule passesthrough the nanopore represents a reading of the DNA sequence.

In another embodiment, the present method comprises obtaining sequenceinformation for the nucleic acids in the test sample e.g. cfDNA in amaternal test sample, using the chemical-sensitive field effecttransistor (chemFET) array (e.g., as described in U.S. PatentApplication Publication No. 20090026082). In one example of thetechnique, DNA molecules can be placed into reaction chambers, and thetemplate molecules can be hybridized to a sequencing primer bound to apolymerase. Incorporation of one or more triphosphates into a newnucleic acid strand at the 3′ end of the sequencing primer can bediscerned by a change in current by a chemFET. An array can havemultiple chemFET sensors. In another example, single nucleic acids canbe attached to beads, and the nucleic acids can be amplified on thebead, and the individual beads can be transferred to individual reactionchambers on a chemFET array, with each chamber having a chemFET sensor,and the nucleic acids can be sequenced.

In another embodiment, the present method comprises obtaining sequenceinformation for the nucleic acids in the test sample e.g. cfDNA in amaternal test sample, using the Halcyon Molecular's technology, whichuses transmission electron microscopy (TEM). The method, termedIndividual Molecule Placement Rapid Nano Transfer (IMPRNT), comprisesutilizing single atom resolution transmission electron microscopeimaging of high-molecular weight (150 kb or greater) DNA selectivelylabeled with heavy atom markers and arranging these molecules onultra-thin films in ultra-dense (3 nm strand-to-strand) parallel arrayswith consistent base-to-base spacing. The electron microscope is used toimage the molecules on the films to determine the position of the heavyatom markers and to extract base sequence information from the DNA. Themethod is further described in PCT patent publication WO 2009/046445.The method allows for sequencing complete human genomes in less than tenminutes.

In another embodiment, the DNA sequencing technology is the Ion Torrentsingle molecule sequencing, which pairs semiconductor technology with asimple sequencing chemistry to directly translate chemically encodedinformation (A, C, G, T) into digital information (0, 1) on asemiconductor chip. In nature, when a nucleotide is incorporated into astrand of DNA by a polymerase, a hydrogen ion is released as abyproduct. Ion Torrent uses a high-density array of micro-machined wellsto perform this biochemical process in a massively parallel way. Eachwell holds a different DNA molecule. Beneath the wells is anion-sensitive layer and beneath that an ion sensor. When a nucleotide,for example a C, is added to a DNA template and is then incorporatedinto a strand of DNA, a hydrogen ion will be released. The charge fromthat ion will change the pH of the solution, which can be detected byIon Torrent's ion sensor. The sequencer—essentially the world's smallestsolid-state pH meter-calls the base, going directly from chemicalinformation to digital information. The Ion personal Genome Machine(PGM™) sequencer then sequentially floods the chip with one nucleotideafter another. If the next nucleotide that floods the chip is not amatch. No voltage change will be recorded and no base will be called. Ifthere are two identical bases on the DNA strand, the voltage will bedouble, and the chip will record two identical bases called. Directdetection allows recordation of nucleotide incorporation in seconds.

In another embodiment, the present method comprises obtaining sequenceinformation for the nucleic acids in the test sample e.g. cfDNA in amaternal test sample, using sequencing by hybridization.Seqeuncing-by-hybridization comprises contacting the plurality ofpolynucleotide sequences with a plurality of polynucleotide probes,wherein each of the plurality of polynucleotide probes can be optionallytethered to a substrate. The substrate might be flat surface comprisingan array of known nucleotide sequences. The pattern of hybridization tothe array can be used to determine the polynucleotide sequences presentin the sample. In other embodiments, each probe is tethered to a bead,e.g., a magnetic bead or the like. Hybridization to the beads can bedetermined and used to identify the plurality of polynucleotidesequences within the sample.

In another embodiment, the present method comprises obtaining sequenceinformation for the nucleic acids in the test sample e.g. cfDNA in amaternal test sample, by massively parallel sequencing of millions ofDNA fragments using Illumina's sequencing-by-synthesis and reversibleterminator-based sequencing chemistry (e.g. as described in Bentley etal., Nature 6:53-59 [2009]). Template DNA can be genomic DNA e.g. cfDNA.In some embodiments, genomic DNA from isolated cells is used as thetemplate, and it is fragmented into lengths of several hundred basepairs. In other embodiments, cfDNA is used as the template, andfragmentation is not required as cfDNA exists as short fragments. Forexample fetal cfDNA circulates in the bloodstream as fragmentsapproximately 170 base pairs (bp) in length (Fan et al., Clin Chem56:1279-1286 [2010]), and no fragmentation of the DNA is required priorto sequencing. Illumina's sequencing technology relies on the attachmentof fragmented genomic DNA to a planar, optically transparent surface onwhich oligonucleotide anchors are bound. Template DNA is end-repaired togenerate 5′-phosphorylated blunt ends, and the polymerase activity ofKlenow fragment is used to add a single A base to the 3′ end of theblunt phosphorylated DNA fragments. This addition prepares the DNAfragments for ligation to oligonucleotide adapters, which have anoverhang of a single T base at their 3′ end to increase ligationefficiency. The adapter oligonucleotides are complementary to theflow-cell anchors. Under limiting-dilution conditions, adapter-modified,single-stranded template DNA is added to the flow cell and immobilizedby hybridization to the anchors. Attached DNA fragments are extended andbridge amplified to create an ultra-high density sequencing flow cellwith hundreds of millions of clusters, each containing ˜1,000 copies ofthe same template. In one embodiment, the randomly fragmented genomicDNA e.g. cfDNA, is amplified using PCR before it is subjected to clusteramplification. Alternatively, an amplification-free genomic librarypreparation is used, and the randomly fragmented genomic DNA e.g. cfDNAis enriched using the cluster amplification alone (Kozarewa et al.,Nature Methods 6:291-295 [2009]). The templates are sequenced using arobust four-color DNA sequencing-by-synthesis technology that employsreversible terminators with removable fluorescent dyes. High-sensitivityfluorescence detection is achieved using laser excitation and totalinternal reflection optics. Short sequence reads of about 20-40 bp e.g.36 bp, are aligned against a repeat-masked reference genome and uniquemapping of the short sequence reads to the reference genome areidentified using specially developed data analysis pipeline software.Non-repeat-masked reference genomes can also be used. Whetherrepeat-masked or non-repeat-masked reference genomes are used, onlyreads that map uniquely to the reference genome are counted. Aftercompletion of the first read, the templates can be regenerated in situto enable a second read from the opposite end of the fragments. Thus,either single-end or paired end sequencing of the DNA fragments can beused. Partial sequencing of DNA fragments present in the sample isperformed, and sequence tags comprising reads of predetermined lengthe.g. 36 bp, are mapped to a known reference genome are counted. In oneembodiment, the reference genome sequence is the NCBI36/hg18 sequence,which is available on the world wide web atgenome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg18&hgsid=166260105).Alternatively, the reference genome sequence is the GRCh37/hg19, whichis available on the world wide web at genome.ucsc.edu/cgi-bin/hgGateway.Other sources of public sequence information include GenBank, dbEST,dbSTS, EMBL (the European Molecular Biology Laboratory), and the DDBJ(the DNA Databank of Japan). A number of computer algorithms areavailable for aligning sequences, including without limitation BLAST(Altschul et al., 1990), BLITZ (MPsrch) (Sturrock & Collins, 1993),FASTA (Person & Lipman, 1988), BOWTIE (Langmead et al., Genome Biology10:R25.1-R25.10 [2009]), or ELAND (Illumina, Inc., San Diego, Calif.,USA). In one embodiment, one end of the clonally expanded copies of theplasma cfDNA molecules is sequenced and processed by bioinformaticalignment analysis for the Illumina Genome Analyzer, which uses theEfficient Large-Scale Alignment of Nucleotide Databases (ELAND)software.

In some embodiments of the method described herein, the mapped sequencetags comprise sequence reads of about 20 bp, about 25 bp, about 30 bp,about 35 bp, about 40 bp, about 45 bp, about 50 bp, about 55 bp, about60 bp, about 65 bp, about 70 bp, about 75 bp, about 80 bp, about 85 bp,about 90 bp, about 95 bp, about 100 bp, about 110 bp, about 120 bp,about 130, about 140 bp, about 150 bp, about 200 bp, about 250 bp, about300 bp, about 350 bp, about 400 bp, about 450 bp, or about 500 bp. It isexpected that technological advances will enable single-end reads ofgreater than 500 bp enabling for reads of greater than about 1000 bpwhen paired end reads are generated. In one embodiment, the mappedsequence tags comprise sequence reads that are 36 bp. Mapping of thesequence tags is achieved by comparing the sequence of the tag with thesequence of the reference to determine the chromosomal origin of thesequenced nucleic acid (e.g. cfDNA) molecule, and specific geneticsequence information is not needed. A small degree of mismatch (0-2mismatches per sequence tag) may be allowed to account for minorpolymorphisms that may exist between the reference genome and thegenomes in the mixed sample.

A plurality of sequence tags are obtained per sample. In someembodiments, at least about 3×10⁶ sequence tags, at least about 5×10⁶sequence tags, at least about 8×10⁶ sequence tags, at least about 10×10⁶sequence tags, at least about 15×10⁶ sequence tags, at least about20×10⁶ sequence tags, at least about 30×10⁶ sequence tags, at leastabout 40×10⁶ sequence tags, or at least about 50×10⁶ sequence tagscomprising between 20 and 40 bp reads e.g. 36 bp, are obtained frommapping the reads to the reference genome per sample. In one embodiment,all the sequence reads are mapped to all regions of the referencegenome. In one embodiment, the tags that have been mapped to all regionse.g. all chromosomes, of the reference genome are counted, and the CNVi.e. the over- or under-representation of a sequence of interest e.g. achromosome or portion thereof, in the mixed DNA sample is determined.The method does not require differentiation between the two genomes.

The accuracy required for correctly determining whether a CNV e.g.aneuploidy, is present or absent in a sample, is predicated on thevariation of the number of sequence tags that map to the referencegenome among samples within a sequencing run (inter-chromosomalvariability), and the variation of the number of sequence tags that mapto the reference genome in different sequencing runs (inter-sequencingvariability). For example, the variations can be particularly pronouncedfor tags that map to GC-rich or GC-poor reference sequences. Othervariations can result from using different protocols for the extractionand purification of the nucleic acids, the preparation of the sequencinglibraries, and the use of different sequencing platforms. The presentmethod uses sequence doses (chromosome doses, or segment doses) based onthe knowledge of normalizing sequences (normalizing chromosome sequencesor normalizing segment sequences), to intrinsically account for theaccrued variability stemming from interchromosomal (intra-run), andinter-sequencing (inter-run) and platform-dependent variability.Chromosome doses are based on the knowledge of a normalizing chromosomesequence, which can be composed of a single chromosome, or of two ormore chromosomes selected from chromosomes 1-22, X, and Y.Alternatively, normalizing chromosome sequences can be composed of asingle chromosome segment, or of two or more segments of one chromosomeor of two or more chromosomes. Segment doses are based on the knowledgeof a normalizing segment sequence, which can be composed of a singlesegment of any one chromosome, or of two or more segments of any two ormore of chromosomes 1-22, X, and Y.

Determination of Normalizing Sequences in Qualified Samples: NormalizingChromosome Sequences and Normalizing Segment Sequences

Normalizing sequences are identified using sequence information from aset of qualified samples obtained from subjects known to comprise cellshaving a normal copy number for any one sequence of interest e.g. achromosome or segment thereof. Determination of normalizing sequences isoutlined in steps 100, 120, 130, 140, and 145 of the embodiment of themethod depicted in FIG. 1. The sequence information obtained from thequalified samples is also used for determining statistically meaningfulidentification of chromosomal aneuploidies in test samples (step 155FIG. 1, and Examples). FIG. 1 provides a flow diagram of an embodimentof the method of the invention 100 for determining a CNV of a sequenceof interest e.g. a chromosome or segment thereof, in a biologicalsample. In some embodiments, a biological sample is obtained from asubject and comprises a mixture of nucleic acids contributed bydifferent genomes. The different genomes can be contributed to thesample by two individuals e.g. the different genomes are contributed bythe fetus and the mother carrying the fetus. Alternatively, the genomesare contributed to the sample by aneuploid cancerous cells and normaleuploid cells from the same subject e.g. a plasma sample from a cancerpatient.

A set of qualified samples is obtained to identify qualified normalizingsequences and to provide variance values for use in determiningstatistically meaningful identification of CNV in test samples. In step110, a plurality of biological qualified samples are obtained from aplurality of subjects known to comprise cells having a normal copynumber for any one sequence of interest. In one embodiment, thequalified samples are obtained from mothers pregnant with a fetus thathas been confirmed using cytogenetic means to have a normal copy numberof chromosomes. The biological qualified samples may be a biologicalfluid e.g. plasma, or any suitable sample as described below. In someembodiments, a qualified sample contains a mixture of nucleic acidmolecules e.g. cfDNA molecules. In some embodiments, the qualifiedsample is a maternal plasma sample that contains a mixture of fetal andmaternal cfDNA molecules. Sequence information for normalizingchromosomes and/or segments thereof is obtained by sequencing at least aportion of the nucleic acids e.g. fetal and maternal nucleic acids,using any known sequencing method. Preferably, any one of the NextGeneration Sequencing (NGS) methods described elsewhere herein is usedto sequence the fetal and maternal nucleic acids as single or clonallyamplified molecules.

In step 120, at least a portion of each of all the qualified nucleicacids contained in the qualified samples are sequenced to generatemillions of sequence reads e.g. 36 bp reads, which are aligned to areference genome, e.g. hg18. In some embodiments, the sequence readscomprise about 20 bp, about 25 bp, about 30 bp, about 35 bp, about 40bp, about 45 bp, about 50 bp, about 55 bp, about 60 bp, about 65 bp,about 70 bp, about 75 bp, about 80 bp, about 85 bp, about 90 bp, about95 bp, about 100 bp, about 110 bp, about 120 bp, about 130, about 140bp, about 150 bp, about 200 bp, about 250 bp, about 300 bp, about 350bp, about 400 bp, about 450 bp, or about 500 bp. It is expected thattechnological advances will enable single-end reads of greater than 500bp enabling for reads of greater than about 1000 bp when paired endreads are generated. In one embodiment, the mapped sequence readscomprise 36 bp. Sequence reads are aligned to a reference genome, andthe reads that are uniquely mapped to the reference genome are known assequence tags. In one embodiment, at least about 3×10⁶ qualifiedsequence tags, at least about 5×10⁶ qualified sequence tags, at leastabout 8×10⁶ qualified sequence tags, at least about 10×10⁶ qualifiedsequence tags, at least about 15×10⁶ qualified sequence tags, at leastabout 20×10⁶ qualified sequence tags, at least about 30×10⁶ qualifiedsequence tags, at least about 40×10⁶ qualified sequence tags, or atleast about 50×10⁶ qualified sequence tags comprising between 20 and 40bp reads are obtained from reads that map uniquely to a referencegenome.

In step 130, all the tags obtained from sequencing the nucleic acids inthe qualified samples are counted to determine a qualified sequence tagdensity. In one embodiment the sequence tag density is determined as thenumber of qualified sequence tags mapped to the sequence of interest onthe reference genome. In another embodiment, the qualified sequence tagdensity is determined as the number of qualified sequence tags mapped toa sequence of interest normalized to the length of the qualifiedsequence of interest to which they are mapped. Sequence tag densitiesthat are determined as a ratio of the tag density relative to the lengthof the sequence of interest are herein referred to as tag densityratios. Normalization to the length of the sequence of interest is notrequired, and may be included as a step to reduce the number of digitsin a number to simplify it for human interpretation. As all qualifiedsequence tags are mapped and counted in each of the qualified samples,the sequence tag density for a sequence of interest e.g. aclinically-relevant sequence, in the qualified samples is determined, asare the sequence tag densities for additional sequences from whichnormalizing sequences are identified subsequently.

In some embodiments, the sequence of interest is a chromosome that isassociated with a complete chromosomal aneuploidy e.g. chromosome 21,and the qualified normalizing sequence is a complete chromosome that isnot associated with a chromosomal aneuploidy and whose variation insequence tag density best approximates that of the sequence (i.e.chromosome) of interest e.g. chromosome 21. Any one or more ofchromosomes 1-22, X, and Y can be a sequence of interest, and one ormore chromosomes can be identified as the normalizing sequence for eachof the any one chromosomes 1-22, X and Y in the qualified samples. Thenormalizing chromosome can be an individual chromosome or it can be agroup of chromosomes as described elsewhere herein.

In another embodiment, the sequence of interest is a segment of achromosome associated with a partial aneuploidy, e.g. a chromosomaldeletion or insertion, or unbalanced chromosomal translocation, and thenormalizing sequence is a chromosomal segment that is not associatedwith the partial aneuploidy and whose variation in sequence tag densitybest approximates that of the chromosome segment associated with thepartial aneuploidy. Any one or more segments of any one or morechromosomes 1-22, X, and Y can be a sequence of interest.

In all embodiments, whether a single sequence or a group of sequencesare identified in the qualified samples as the normalizing sequence forany one or more sequence of interest, the qualified normalizing sequencehas a variation in sequence tag density best approximates that of thesequence of interest as determined in the qualified samples. Forexample, a qualified normalizing sequence is a sequence that has thesmallest variability i.e. the variability of the normalizing sequence isclosest to that of the sequence of interest.

In some embodiments, the normalizing sequence is a sequence that bestdistinguishes one or more qualified, samples from one or more affectedsamples, which implies that the normalizing sequence is a sequence thathas the greatest differentiability i.e. the differentiability of thenormalizing sequence is such that it provides optimal differentiation toa sequence of interest in an affected test sample to easily distinguishthe affected test sample from other unaffected samples. In otherembodiments, the normalizing sequence is a sequence that has thesmallest variability and the greatest differentiability. The level ofdifferentiability can be determined as a statistical difference betweenthe sequence doses e.g. chromosome doses or segment doses, in apopulation of qualified samples and the chromosome dose(s) in one ormore test samples as described below and shown in the Examples. Forexample, differentiability can be represented numerically as a T-testvalue, which represents the statistical difference between thechromosome doses in a population of qualified samples and the chromosomedose(s) in one or more test samples. Alternatively, differentiabilitycan be represented numerically as a Normalized Chromosome Value (NCV),which is a z-score for chromosome doses as long as the distribution forthe NCV is normal. Similarly, differentiability can be representednumerically as a T-test value, which represents the statisticaldifference between the segment doses in a population of qualifiedsamples and the segment dose(s) in one or more test samples.Alternatively, differentiability of segment doses can be representednumerically as a Normalized Segment Value (NSV), which is a z-score forchromosome doses as long as the distribution for the NSV is normal. Indetermining the z-score, the mean and standard deviation of chromosomeor segment doses in a set of qualified samples can be used.Alternatively, the mean and standard deviation of chromosome or segmentdoses in a training set comprising qualified samples and affectedsamples can be used. In other embodiments, the normalizing sequence is asequence that has the smallest variability and the greatestdifferentiability.

The method identifies sequences that inherently have similarcharacteristics and that are prone to similar variations among samplesand sequencing runs, and which are useful for determining sequence dosesin test samples.

Determination of Sequence Doses (i.e. Chromosome Doses or Segment Doses)in Qualified Samples

In step 140, based on the calculated qualified tag densities, aqualified sequence dose i.e. a chromosome dose or a segment dose, for asequence of interest is determined as the ratio of the sequence tagdensity for the sequence of interest and the qualified sequence tagdensity for additional sequences from which normalizing sequences areidentified subsequently in step 145. The identified normalizingsequences are used subsequently to determine sequence doses in testsamples.

In one embodiment, the sequence dose in the qualified samples is achromosome dose that is calculated as the ratio of the number ofsequence tags for a chromosome of interest and the number of sequencetags for a normalizing chromosome sequence in a qualified sample. Thenormalizing chromosome sequence can be a single chromosome, a group ofchromosomes, a segment of one chromosome, or a group of segments fromdifferent chromosomes. Accordingly, a chromosome dose for a chromosomeof interest is determined in a qualified sample as (i) the ratio of thenumber of tags for a chromosome of interest and the number of tags for anormalizing chromosome sequence composed of a single chromosome, (ii)the ratio of the number of tags for a chromosome of interest and thenumber of tags for a normalizing chromosome sequence composed of two ormore chromosomes, or (iii) the ratio of the number of tags for achromosome of interest and the number of tags for a normalizing segmentsequence composed of a single segment of a chromosome, (iv) the ratio ofthe number of tags for a chromosome of interest and the number of tagsfor a normalizing segment sequence composed of two or more segments formone chromosome, or (v) the ratio of the number of tags for a chromosomeof interest and the number of tags for a normalizing segment sequencecomposed of two or more segments of two or more chromosomes. Examplesfor determining a chromosome dose for chromosome of interest 21according to (i)-(v) are as follows: chromosome doses for chromosome ofinterest e.g. chromosome 21, are determined as a ratio of the sequencetag density of chromosome 21 and the sequence tag density for each ofall the remaining chromosomes i.e. chromosomes 1-20, chromosome 22,chromosome X, and chromosome Y (i); chromosome doses for chromosome ofinterest e.g. chromosome 21, are determined as a ratio of the sequencetag density of chromosome 21 and the sequence tag density for allpossible combinations of two or more remaining chromosomes (ii);chromosome doses for chromosome of interest e.g. chromosome 21, aredetermined as a ratio of the sequence tag density of chromosome 21 andthe sequence tag density for a segment of another chromosome e.g.chromosome 9 (iii); chromosome doses for chromosome of interest e.g.chromosome 21, are determined as a ratio of the sequence tag density ofchromosome 21 and the sequence tag density for two segment of oneanother chromosome e.g. two segments of chromosome 9 (iv); andchromosome doses for chromosome of interest e.g. chromosome 21, aredetermined as a ratio of the sequence tag density of chromosome 21 andthe sequence tag density for two segments of two different chromosomese.g. a segment of chromosome 9 and a segment of chromosome 14.

In another embodiment, the sequence dose in the qualified samples is asegment dose that is calculated as the ratio of the number of sequencetags for a segment of interest and the number of sequence tags for anormalizing segment sequence in a qualified sample. The normalizingsegment sequence can be a segment of one chromosome, or a group ofsegments from different chromosomes. Accordingly, a segment dose for asegment of interest is determined in a qualified sample as (i) the ratioof the number of tags for a segment of interest and the number of tagsfor a normalizing segment sequence composed of a single segment of achromosome, (ii) the ratio of the number of tags for a segment ofinterest and the number of tags for a normalizing segment sequencecomposed of two or more segments of one chromosome, or (iii) the ratioof the number of tags for a segment of interest and the number of tagsfor a normalizing segment sequence composed of two or more segments oftwo or more different chromosomes.

Chromosome doses for one or more chromosomes of interest are determinedin all qualified samples, and a normalizing chromosome sequence isidentified in step 145. Similarly, segment doses for one or moresegments of interest are determined in all qualified samples, and anormalizing segment sequence is identified in step 145.

Identification of Normalizing Sequences from Qualified Sequence Doses

In step 145, a normalizing sequence is identified for a sequence ofinterest as the sequence based on the calculated sequence doses i.e.that results in the smallest variability in sequence dose for thesequence of interest across all qualified samples. The method identifiessequences that inherently have similar characteristics and that areprone to similar variations among samples and sequencing runs, and whichare useful for determining sequence doses in test samples.

Normalizing sequences for one or more sequences of interest can beidentified in a set of qualified samples, and the sequences that areidentified in the qualified samples are used subsequently to calculatesequence doses for one or more sequences of interest in each of the testsamples (step 150) to determine the presence or absence of aneuploidy ineach of the test samples. The normalizing sequence identified forchromosomes or segments of interest may differ when different sequencingplatforms are used and/or when differences exist in the purification ofthe nucleic acid that is to be sequenced and/or preparation of thesequencing library. The use of normalizing sequences according to themethod of the invention provides specific and sensitive measure of avariation in copy number of a chromosome or segment thereof irrespectiveof sample preparation and/or sequencing platform that is used.

In some embodiments, more than one normalizing sequence is identifiedi.e. different normalizing sequences can be determined for one sequenceof interest, and multiple sequence doses can be determined for onesequence of interest. For example, the variation e.g. coefficient ofvariation, in chromosome dose for chromosome of interest 21 is leastwhen the sequence tag density of chromosome 14 is used. However, two,three, four, five, six, seven, eight or more normalizing sequences canbe identified for use in determining a sequence dose for a sequence ofinterest in a test sample. As an example, a second dose for chromosome21 in any one test sample can be determined using chromosome 7,chromosome 9, chromosome 11 or chromosome 12 as the normalizingchromosome sequence as these chromosomes all have CV close to that forchromosome 14 (see Example 2, Table 2). Preferably, when a singlechromosome is chosen as the normalizing chromosome sequence for achromosome of interest, the normalizing chromosome sequence will be achromosome that results in chromosome doses for the chromosome ofinterest that has the smallest variability across all samples testede.g. qualified samples.

Normalizing Chromosome Sequence as a Normalizing Sequence forChromosome(s)

In other embodiments, a normalizing chromosome sequence can be a singlesequence or it can be a group of sequences. For example, in someembodiments, a normalizing sequence is a group of sequences e.g. a groupof chromosomes, that is identified as the normalizing sequence for anyor more of chromosomes 1-22, X and Y. The group of chromosomes thatcompose the normalizing sequence for a chromosome of interest i.e. anormalizing chromosome sequence, can be a group of two, three, four,five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen,fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty-one, ortwenty-two chromosomes, and including or excluding one or both ofchromosomes X, and Y. The group of chromosomes that is identified as thenormalizing chromosome sequence is a group of chromosomes that resultsin chromosome doses for the chromosome of interest that has the smallestvariability across all samples tested e.g. qualified samples.Preferably, individual and groups of chromosomes are tested together fortheir ability to best mimic the behavior of the sequence of interest forwhich they are chosen as normalizing chromosome sequences.

In one embodiment, the normalizing sequence for chromosome 21 isselected from chromosome 9, chromosome 1, chromosome 2, chromosome 3,chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8,chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome14, chromosome 15, chromosome 16, and chromosome 17. In anotherembodiment, the normalizing sequence for chromosome 21 is selected fromchromosome 9, chromosome 1, chromosome 2, chromosome 11, chromosome 12,and chromosome 14. Alternatively, the normalizing sequence forchromosome 21 is a group of chromosomes selected from chromosome 9,chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5,chromosome 6, chromosome 7, chromosome 8, chromosome 10, chromosome 11,chromosome 12, chromosome 13, chromosome 14, chromosome 15, chromosome16, and chromosome 17. In another embodiment, the group of chromosomesis a group selected from chromosome 9, chromosome 1, chromosome 2,chromosome 11, chromosome 12, and chromosome 14.

In some embodiments the method is further improved by using anormalizing sequence that is determined by systematic calculation of allchromosome doses using each chromosome individually and in all possiblecombinations with all remaining chromosomes (see Example 7). Forexample, a systematically determined normalizing chromosome can bedetermined for each chromosome of interest by systematically calculatingall possible chromosome doses using one of any of chromosomes 1-22, X,and Y, and combinations of two or more of chromosomes 1-22, X, and Y todetermine which single or group of chromosomes is the normalizingchromosome that results in the least variability of the chromosome dosefor a chromosome of interest across a set of qualified samples (seeExample 7). Accordingly, in one embodiment, the systematicallycalculated normalizing chromosome sequence for chromosome 21 is a groupof chromosomes consisting of chromosome 4, chromosome 14, chromosome 16,chromosome 20, and chromosome 22. Single or groups of chromosomes can bedetermined for all chromosomes in the genome.

In one embodiment, the normalizing sequence for chromosome 18 isselected chromosome 8, chromosome 2, chromosome 3, chromosome 4,chromosome 5, chromosome 6, chromosome 7, chromosome 9, chromosome 10,chromosome 11, chromosome 12, chromosome 13, and chromosome 14.Preferably, the normalizing sequence for chromosome 18 is selected fromchromosome 8, chromosome 2, chromosome 3, chromosome 5, chromosome 6,chromosome 12, and chromosome 14. Alternatively, the normalizingsequence for chromosome 18 is a group of chromosomes selected fromchromosome 8, chromosome 2, chromosome 3, chromosome 4, chromosome 5,chromosome 6, chromosome 7, chromosome 9, chromosome 10, chromosome 11,chromosome 12, chromosome 13, and chromosome 14. Preferably, the groupof chromosomes is a group selected from chromosome 8, chromosome 2,chromosome 3, chromosome 5, chromosome 6, chromosome 12, and chromosome14.

In another embodiment, the normalizing sequence for chromosome 18 isdetermined by systematic calculation of all possible chromosome dosesusing each possible normalizing chromosome individually and all possiblecombinations of normalizing chromosomes (as explained elsewhere herein).Accordingly, in one embodiment, the normalizing sequence for chromosome18 is a normalizing chromosome consisting of the group of chromosomesconsisting of chromosome 2, chromosome 3, chromosome 5, and chromosome7.

In one embodiment, the normalizing sequence for chromosome X is selectedfrom chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome10, chromosome 11, chromosome 12, chromosome 13, chromosome 14,chromosome 15, and chromosome 16. Preferably, the normalizing sequencefor chromosome X is selected from chromosome 2, chromosome 3, chromosome4, chromosome 5, chromosome 6 and chromosome 8. Alternatively, thenormalizing sequence for chromosome X is a group of chromosomes selectedfrom chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome10, chromosome 11, chromosome 12, chromosome 13, chromosome 14,chromosome 15, and chromosome 16. Preferably, the group of chromosomesis a group selected from chromosome 2, chromosome 3, chromosome 4,chromosome 5, chromosome 6, and chromosome 8.

In another embodiment, the normalizing sequence for chromosome X isdetermined by systematic calculation of all possible chromosome dosesusing each possible normalizing chromosome individually and all possiblecombinations of normalizing chromosomes (as explained elsewhere herein).Accordingly, in one embodiment, the normalizing sequence for chromosomeX is a normalizing chromosome consisting of the group of chromosome 4and chromosome 8.

In one embodiment, the normalizing sequence for chromosome 13 is achromosome selected from chromosome 2, chromosome 3, chromosome 4,chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9,chromosome 10, chromosome 11, chromosome 12, chromosome 14, chromosome18, and chromosome 21. Preferably, the normalizing sequence forchromosome 13 is a chromosome selected from chromosome 2, chromosome 3,chromosome 4, chromosome 5, chromosome 6, and chromosome 8. In anotherembodiment, the normalizing sequence for chromosome 13 is a group ofchromosomes selected from chromosome 2, chromosome 3, chromosome 4,chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9,chromosome 10, chromosome 11, chromosome 12, chromosome 14, chromosome18, and chromosome 21. Preferably, the group of chromosomes is a groupselected from chromosome 2, chromosome 3, chromosome 4, chromosome 5,chromosome 6, and chromosome 8.

In another embodiment, the normalizing sequence for chromosome 13 isdetermined by systematic calculation of all possible chromosome dosesusing each possible normalizing chromosome individually and all possiblecombinations of normalizing chromosomes (as explained elsewhere herein).Accordingly, in one embodiment, the normalizing sequence for chromosome13 is a normalizing chromosome comprising the group of chromosome 4 andchromosome 5. In another embodiment, the normalizing sequence forchromosome 13 is a normalizing chromosome consisting of the group ofchromosome 4 and chromosome 5.

The variation in chromosome dose for chromosome Y is greater than 30independently of which normalizing chromosome is used in determining thechromosome Y dose. Therefore, any one chromosome, or a group of two ormore chromosomes selected from chromosomes 1-22 and chromosome X can beused as the normalizing sequence for chromosome Y. In one embodiment,the at least one normalizing chromosome is a group of chromosomesconsisting of chromosomes 1-22, and chromosome X. In another embodiment,the group of chromosomes consists of chromosome 2, chromosome 3,chromosome 4, chromosome 5, and chromosome 6.

In another embodiment, the normalizing sequence for chromosome Y isdetermined by systematic calculation of all possible chromosome dosesusing each possible normalizing chromosome individually and all possiblecombinations of normalizing chromosomes (as explained elsewhere herein).Accordingly, in one embodiment, the normalizing sequence for chromosomeY is a normalizing chromosome comprising the group of chromosomesconsisting of chromosome 4 and chromosome 6. In another embodiment, thenormalizing sequence for chromosome Y is a normalizing chromosomeconsisting of the group of chromosomes consisting of chromosome 4 andchromosome 6.

The normalizing sequence used to calculate the dose of differentchromosomes of interest, or of different segments of interest can be thesame or it can be a different normalizing sequence for differentchromosomes or segments of interest, respectively. For example, thenormalizing sequence e.g. a normalizing chromosome (one or a group) forchromosome of interest A can be the same or it can be different from thenormalizing sequence e.g. a normalizing chromosome (one or a group) forchromosome of interest B.

The normalizing sequence for a complete chromosome may be a completechromosome or a group of complete chromosomes, or it may be a segment ofa chromosome, or a group of segments of one or more chromosomes.

Normalizing Segment Sequence as a Normalizing Sequence for Chromosome(s)

In another embodiment, the normalizing sequence for a chromosome can bea normalizing segment sequence. The normalizing segment sequence can bea single segment or it can be a group of segments of one chromosome, orthey can be segments from two or more different chromosomes. Anormalizing segment sequence can be determined by systematic calculationof all combinations of segment sequences in the genome. For example, anormalizing segment sequence for chromosome 21 can be a single segmentthat is bigger or smaller than the size of chromosome 2, which isapproximately 47 Mbp (million base pairs) from chromosome 9, which isapproximately 140 Mbp. Alternatively, a normalizing sequence forchromosome 21 can be a combination of a sequence form chromosome 1, anda sequence from chromosome 12.

In one embodiment, the normalizing sequence for chromosome 21 is anormalizing segment sequence of one segment or of a group of two or moresegments of chromosomes 1-20, 22, X, and Y. In another embodiment, thenormalizing sequence for chromosome 18 is a segment or groups segmentsof chromosomes 1-17, 19-22, X, and Y. In another embodiment, thenormalizing sequence for chromosome 13 is a segment or groups ofsegments of chromosomes 1-12, 14-22, X, and Y. In another embodiment,the normalizing sequence for chromosome X is a segment or groupssegments of chromosomes 1-22, and Y. In another embodiment, thenormalizing sequence for chromosome Y is a segment or group of segmentsof chromosomes 1-22, and X. Normalizing segment sequences of single orgroups of segments can be determined for all chromosomes in the genome.The two or more segments of a normalizing segment sequence can besegments from one chromosome, or the two or more segments can besegments of two or more different chromosomes. As described fornormalizing chromosome sequences, a normalizing segment sequence can bethe same for two or more different chromosomes.

Normalizing Segment Sequence as a Normalizing Sequence for ChromosomeSegment(s)

The presence or absence of CNV of a sequence of interest can bedetermined when the sequence of interest is a segment of a chromosome.Variation in the copy number of a chromosome segment allows fordetermining the presence or absence of a partial chromosomal aneuploidy.Described below are examples of partial chromosomal aneuploidies thatare associated with various fetal abnormalities and disease conditions.The segment of the chromosome can be of any length. For example, it canrange from a kilobase to hundreds of megabases. The human genomeoccupies just over 3 billion DNA bases, which can be divided into tens,thousands, hundreds of thousands and millions of segments of differentsizes of which the copy number can be determined according to thepresent method. The normalizing sequence for a segment of a chromosomeis a normalizing segment sequence, which can be a single segment fromany one of the chromosomes 1-22, X and Y, or it can be a group ofsegments from any one or more of chromosomes 1-22, X, and Y.

The normalizing sequence for a segment of interest is a sequence thathas a variability across chromosomes and across samples that is closestto that of the segment of interest. Determination of a normalizingsequence can be performed as described for determining the normalizingsequence for a chromosome of interest when the normalizing sequence is agroup of segments of any one or more of chromosomes 1-22, X and Y. Anormalizing segment sequence of one or a group of segments can beidentified by calculating segment doses using one, and all possiblecombinations of two or more segments as normalizing sequences for thesegment of interest in each sample of a set of qualified samples i.e.samples known to be diploid for the segment of interest, and thenormalizing sequence is determined as that providing a segment dosehaving the lowest variability for the segment of interest across allqualified samples, as is described above for normalizing chromosomesequences.

For example, for a segment of interest that is 1 Mb (megabase), theremaining 3 million segments (minus the 1 mg segment of interest) of theapproximately 3 Gb human genome can be used individually or incombination with each other to calculate segment doses for a segment ofinterest in a qualified set of sample to determine which one or group ofsegments would serve as the normalizing segment sequence for qualifiedand test samples. Segments of interest can vary from about 1000 bases totens of megabases. Normalizing segment sequences can be composed of oneor more segments of the same size as that of the sequence of interest.In other embodiment, the normalizing segment sequence can be composed ofsegments that differ from that of the sequence of interest, and/or fromeach other. For example, a normalizing segment sequence for a 10,0000base long sequence can be 20,000 bases long, and comprise a combinationof sequences of different lengths e.g. a 7,000+8,000+5,000 bases. As isdescribed elsewhere herein for normalizing chromosome sequences,normalizing segment sequences can be determined by systematiccalculation of all possible chromosome and/or segment doses using eachpossible normalizing chromosome segment individually and all possiblecombinations of normalizing segments (as explained elsewhere herein).Single or groups of segments can be determined for all segments and/orchromosomes in the genome.

The normalizing sequence used to calculate the dose of differentchromosome segments of interest can be the same or it can be a differentnormalizing sequence for different chromosome segments of interest. Forexample, the normalizing sequence e.g. a normalizing segment (one or agroup) for chromosome segment of interest A can be the same or it can bedifferent from the normalizing sequence e.g. a normalizing segment (oneor a group) for chromosome segment of interest B.

Determination of Aneuploidies in Test Samples

Based on the identification of the normalizing sequence(s) in qualifiedsamples, a sequence dose is determined for a sequence of interest in atest sample comprising a mixture of nucleic acids derived from genomesthat differ in one or more sequences of interest.

In step 115, a test sample is obtained from a subject suspected or knownto carry a clinically-relevant CNV of a sequence of interest. The testsample may be a biological fluid e.g. plasma, or any suitable sample asdescribed below. In some embodiments, a test sample contains a mixtureof nucleic acid molecules e.g. cfDNA molecules. In some embodiments, thetest sample is a maternal plasma sample that contains a mixture of fetaland maternal cfDNA molecules.

In step 125, at least a portion of the test nucleic acids in the testsample is sequenced as described for the qualified samples to generatemillions of sequence reads e.g. 36 bp reads. As in step 120, the readsgenerated from sequencing the nucleic acids in the test sample areuniquely mapped to a reference genome. As described in step 120, atleast about 3×10⁶ qualified sequence tags, at least about 5×10⁶qualified sequence tags, at least about 8×10⁶ qualified sequence tags,at least about 10×10⁶ qualified sequence tags, at least about 15×10⁶qualified sequence tags, at least about 20×10⁶ qualified sequence tags,at least about 30×10⁶ qualified sequence tags, at least about 40×10⁶qualified sequence tags, or at least about 50×10⁶ qualified sequencetags comprising between 20 and 40 bp reads are obtained from reads thatmap uniquely to a reference genome.

In step 135, all the tags obtained from sequencing the nucleic acids inthe test samples are counted to determine a test sequence tag density.In one embodiment, the number of test sequence tags mapped to a sequenceof interest is normalized to the known length of a sequence of interestto which they are mapped to provide a test sequence tag density ratio.As described for the qualified samples, normalization to the knownlength of a sequence of interest is not required, and may be included asa step to reduce the number of digits in a number to simplify it forhuman interpretation. As all the mapped test sequence tags are countedin the test sample, the sequence tag density for a sequence of intereste.g. a clinically-relevant sequence, in the test samples is determined,as are the sequence tag densities for additional sequences thatcorrespond to at least one normalizing sequence identified in thequalified samples.

In step 150, based on the identity of at least one normalizing sequencein the qualified samples, a test sequence dose is determined for asequence of interest in the test sample. As described elsewhere herein,the at least one normalizing sequence can be a single sequence or agroup of sequences. The sequence dose for a sequence of interest in atest sample is a ratio of the sequence tag density determined for thesequence of interest in the test sample and the sequence tag density ofat least one normalizing sequence determined in the test sample, whereinthe normalizing sequence in the test sample corresponds to thenormalizing sequence identified in the qualified samples for theparticular sequence of interest. For example, if the normalizingsequence identified for chromosome 21 in the qualified samples isdetermined to be a chromosome e.g. chromosome 14, then the test sequencedose for chromosome 21 (sequence of interest) is determined as the ratioof the sequence tag density for chromosome 21 in and the sequence tagdensity for chromosome 14 each determined in the test sample. Similarly,chromosome doses for chromosomes 13, 18, X, Y, and other chromosomesassociated with chromosomal aneuploidies are determined. A normalizingsequence for a chromosome of interest can be one or a group ofchromosomes, or one or a group of chromosome segments. As describedpreviously, a sequence of interest can be part of a chromosome e.g. achromosome segment. Accordingly, the dose for a chromosome segment canbe determined as the ratio of the sequence tag density determined forthe segment in the test sample and the sequence tag density for thenormalizing chromosome segment in the test sample, wherein thenormalizing segment in the test sample corresponds to the normalizingsegment (single or a group of segments) identified in the qualifiedsamples for the particular segment of interest. Chromosome segments canrange from kilobases (kb) to megabases (Mb) in size.

In step 155, threshold values are derived from standard deviation valuesestablished for qualified sequence doses determined in a plurality ofqualified samples and sequence doses determined for samples known to beaneuploid for a sequence of interest. Accurate classification depends onthe differences between probability distributions for the differentclasses i.e. type of aneuploidy. Preferably, thresholds are chosen fromempirical distribution for each type of aneuploidy e.g. trisomy 21.Possible threshold values that were established for classifying trisomy13, trisomy 18, trisomy 21, and monosomy X aneuploidies as described inthe Examples, which describe the use of the method for determiningchromosomal aneuploidies by sequencing cfDNA extracted from a maternalsample comprising a mixture of fetal and maternal nucleic acids. Thethreshold value that is determined to distinguish samples affected foran aneuploidy of a chromosome can be the same or can be different fromthe threshold that is determined to distinguish samples affected for adifferent aneuploidy. As is shown in the Examples, the threshold valuefor each chromosome of interest is determined from the variability inthe dose of the chromosome of interest across samples and sequencingruns. The less variable the chromosome dose for any chromosome ofinterest, the narrower the spread in the dose for the chromosome ofinterest across all the unaffected samples, which are used to set thethreshold for determining different aneuploidies.

In step 160, the copy number variation of the sequence of interest isdetermined in the test sample by comparing the test sequence dose forthe sequence of interest to at least one threshold value establishedfrom the qualified sequence doses.

In step 165, the calculated dose for a test sequence of interest iscompared to that set as the threshold values that are chosen accordingto a user-defined threshold of reliability to classify the sample as a“normal” an “affected” or a “no call”. The “no call” samples are samplesfor which a definitive diagnosis cannot be made with reliability.

Another embodiment of the invention provides a method for providingprenatal diagnosis of a fetal chromosomal aneuploidy in a biologicalsample comprising fetal and maternal nucleic acid molecules. Thediagnosis is made based on obtaining sequence information sequencing atleast a portion of the mixture of the fetal and maternal nucleic acidmolecules derived from a biological test sample e.g. a maternal plasmasample, computing from the sequencing data a normalizing chromosome dosefor one or more chromosomes of interest, and/or a normalizing segmentdose for one or more segments of interest, and determining astatistically significant difference between the chromosome dose for thechromosome of interest and/or the segment dose for the segment ofinterest, respectively, in the test sample and a threshold valueestablished in a plurality of qualified (normal) samples, and providingthe prenatal diagnosis based on the statistical difference. As describedin step 165 of the method, a diagnosis of normal or affected is made. A“no call” is provided in the event that the diagnosis for normal oraffected cannot be made with confidence.

Samples

Samples that are used for determining a CNV e.g. chromosomal and partialaneuploidies, comprise nucleic acids that are present in cells or thatare “cell-free”. In some embodiments of the invention it is advantageousto obtain cell-free nucleic acids e.g. cell-free DNA (cfDNA). Cell-freenucleic acids, including cell-free DNA, can be obtained by variousmethods known in the art from biological samples including but notlimited to plasma and serum (Chen et al., Nature Med. 2: 1033-1035[1996]; Lo et al., Lancet 350: 485-487 [1997]). To separate cell-freeDNA from cells, fractionation, centrifugation (e.g., density gradientcentrifugation), DNA-specific precipitation, or high-throughput cellsorting and/or separation methods can be used.

The sample comprising the mixture of nucleic acids to which the methodsdescribed herein are applied is a biological sample such as a tissuesample, a biological fluid sample, or a cell sample. In someembodiments, the mixture of nucleic acids is purified or isolated fromthe biological sample by any one of the known methods. A sample canconsist of purified or isolated polynucleotide, or it can comprise abiological sample such as a tissue sample, a biological fluid sample, ora cell sample. A biological fluid includes, as non-limiting examples,blood, plasma, serum, sweat, tears, sputum, urine, sputum, ear flow,lymph, saliva, cerebrospinal fluid, ravages, bone marrow suspension,vaginal flow, transcervical lavage, brain fluid, ascites, milk,secretions of the respiratory, intestinal and genitourinary tracts,amniotic fluid and leukophoresis samples. In some embodiments, thesample is a sample that is easily obtainable by non-invasive procedurese.g. blood, plasma, serum, sweat, tears, sputum, urine, sputum, earflow, saliva or feces. Preferably, the biological sample is a peripheralblood sample, or the plasma and serum fractions. In other embodiments,the biological sample is a swab or smear, a biopsy specimen, or a cellculture. In another embodiment, the sample is a mixture of two or morebiological samples e.g. a biological sample can comprise two or more ofa biological fluid sample, a tissue sample, and a cell culture sample.As used herein, the terms “blood,” “plasma” and “serum” expresslyencompass fractions or processed portions thereof. Similarly, where asample is taken from a biopsy, swab, smear, etc., the “sample” expresslyencompasses a processed fraction or portion derived from the biopsy,swab, smear, etc.

In some embodiments, samples can be obtained from sources, including,but not limited to, samples from different individuals, differentdevelopmental stages of the same or different individuals, differentdiseased individuals (e.g., individuals with cancer or suspected ofhaving a genetic disorder), normal individuals, samples obtained atdifferent stages of a disease in an individual, samples obtained from anindividual subjected to different treatments for a disease, samples fromindividuals subjected to different environmental factors, or individualswith predisposition to a pathology, or individuals with exposure to aninfectious disease agent (e.g., HIV).

In one embodiment, the sample is a maternal sample that is obtained froma pregnant female, for example a pregnant woman. In this instance, thesample can be analyzed using the methods described herein to provide aprenatal diagnosis of potential chromosomal abnormalities in the fetus.The maternal sample can be a tissue sample, a biological fluid sample,or a cell sample. A biological fluid includes, as non-limiting examples,blood, plasma, serum, sweat, tears, sputum, urine, sputum, ear flow,lymph, saliva, cerebrospinal fluid, ravages, bone marrow suspension,vaginal flow, transcervical lavage, brain fluid, ascites, milk,secretions of the respiratory, intestinal and genitourinary tracts, andleukophoresis samples. In another embodiment, the maternal sample is amixture of two or more biological samples e.g. a biological sample cancomprise two or more of a biological fluid sample, a tissue sample, anda cell culture sample. In some embodiments, the sample is a sample thatis easily obtainable by non-invasive procedures e.g. blood, plasma,serum, sweat, tears, sputum, urine, sputum, ear flow, saliva and feces.In some embodiments, the biological sample is a peripheral blood sample,or the plasma and serum fractions. In other embodiments, the biologicalsample is a swab or smear, a biopsy specimen, or a cell culture. Asdisclosed above, the terms “blood,” “plasma” and “serum” expresslyencompass fractions or processed portions thereof. Similarly, where asample is taken from a biopsy, swab, smear, etc., the “sample” expresslyencompasses a processed fraction or portion derived from the biopsy,swab, smear, etc.

Samples can also be obtained from in vitro cultured tissues, cells, orother polynucleotide-containing sources. The cultured samples can betaken from sources including, but not limited to, cultures (e.g., tissueor cells) maintained in different media and conditions (e.g., pH,pressure, or temperature), cultures (e.g., tissue or cells) maintainedfor different periods of length, cultures (e.g., tissue or cells)treated with different factors or reagents (e.g., a drug candidate, or amodulator), or cultures of different types of tissue or cells.

Methods of isolating nucleic acids from biological sources are wellknown and will differ depending upon the nature of the source. One ofskill in the art can readily isolate nucleic acid from a source asneeded for the method described herein. In some instances, it can beadvantageous to fragment the nucleic acid molecules in the nucleic acidsample. Fragmentation can be random, or it can be specific, as achieved,for example, using restriction endonuclease digestion. Methods forrandom fragmentation are well known in the art, and include, forexample, limited DNAse digestion, alkali treatment and physicalshearing. In one embodiment, sample nucleic acids are obtained from ascfDNA, which is not subjected to fragmentation. In other embodiments,the sample nucleic acids are obtained as genomic DNA, which is subjectedto fragmentation into fragments of approximately 500 or more base pairs,and to which NGS methods can be readily applied.

Determination of CNV for Prenatal Diagnoses

Cell-free fetal DNA and RNA circulating in maternal blood can be usedfor the early non-invasive prenatal diagnosis (NIPD) of an increasingnumber of genetic conditions, both for pregnancy management and to aidreproductive decision-making. The presence of cell-free DNA circulatingin the bloodstream has been known for over 50 years. More recently,presence of small amounts of circulating fetal DNA was discovered in thematernal bloodstream during pregnancy (Lo et al., Lancet 350:485-487[1997]). Thought to originate from dying placental cells, cell-freefetal DNA (cfDNA) has been shown to consists of short fragmentstypically fewer than 200 bp in length Chan et al., Clin Chem 50:88-92[2004]), which can be discerned as early as 4 weeks gestation (Illaneset al., Early Human Dev 83:563-566 [2007]), and known to be cleared fromthe maternal circulation within hours of delivery (Lo et al., Am J HumGenet 64:218-224 [1999]). In addition to cfDNA, fragments of cell-freefetal RNA (cfRNA) can also be discerned in the maternal bloodstream,originating from genes that are transcribed in the fetus or placenta.The extraction and subsequent analysis of these fetal genetic elementsfrom a maternal blood sample offers novel opportunities for NIPD.

The present method is a polymorphism-independent method that for use inNIPD and that does not require that the fetal cfDNA be distinguishedfrom the maternal cfDNA to enable the determination of a fetalaneuploidy. In some embodiments, the aneuploidy is a completechromosomal trisomy or monosomy, or a partial trisomy or monosomy.Partial aneuploidies are caused by loss or gain of part of a chromosome,and encompass chromosomal imbalances resulting from unbalancedtranslocations, unbalanced inversions, deletions and insertions. By far,the most common known aneuploidy compatible with life is trisomy 21 i.e.Down Syndrome (DS), which is caused by the presence of part or all ofchromosome 21. Rarely, DS can be cause by an inherited or sporadicdefect whereby an extra copy of all or part of chromosome 21 becomesattached to another chromosome (usually chromosome 14) to form a singleaberrant chromosome. DS is associated with intellectual impairment,severe learning difficulties and excess mortality caused by long-termhealth problems such as heart disease. Other aneuploidies with knownclinical significance include Edward syndrome (trisomy 18) and PatauSyndrome (trisomy 13), which are frequently fatal within the first fewmonths of life. Abnormalities associated with the number of sexchromosomes are also known and include monosomy X e.g. Turner syndrome(XO), and triple X syndrome (XXX) in female births and Kleinefeltersyndrome (XXY) and XYY syndrome in male births, which are all associatedwith various phenotypes including sterility and reduction inintellectual skills. The method of the invention can be used to diagnosethese and other chromosomal abnormalities prenatally.

According to some embodiments of the present invention the trisomydetermined by the present invention include without limitation trisomy21 (T21; Down Syndrome), trisomy 18 (T18; Edward's Syndrome), trisomy 16(T16), trisomy 22 (T22; Cat Eye Syndrome), trisomy 15 (T15; Prader WilliSyndrome), trisomy 13 (T13; Patau Syndrome), trisomy 8 (T8; WarkanySyndrome) and the XXY (Kleinefelter Syndrome), XYY, or XXX trisomies. Itwill be appreciated that various other complete trisomies and partialtrisomies can be determined in fetal cfDNA according to the teachings ofthe present invention. Examples of partial trisomies include, but arenot limited to, partial trisomy 1q32-44, trisomy 9 p with trisomy,trisomy 4 mosaicism, trisomy 17p, partial trisomy 4q26-qter, trisomy 9,partial 2p trisomy, partial trisomy 1q, and/or partial trisomy6p/monosomy 6q.

The method of the present invention can be also used to determinechromosomal monosomy X, and partial monosomies such as, monosomy 13,monosomy 15, monosomy 16, monosomy 21, and monosomy 22, which are knownto be involved in pregnancy miscarriage. Partial monosomy of chromosomestypically involved in complete aneuploidy can also be determined by themethod of the invention. Monosomy 18p is a rare chromosomal disorder inwhich all or part of the short arm (p) of chromosome 18 is deleted(monosomic). The disorder is typically characterized by short stature,variable degrees of mental retardation, speech delays, malformations ofthe skull and facial (craniofacial) region, and/or additional physicalabnormalities. Associated craniofacial defects may vary greatly in rangeand severity from case to case. Conditions caused by changes in thestructure or number of copies of chromosome 15 include Angelman Syndromeand Prader-Willi Syndrome, which involve a loss of gene activity in thesame part of chromosome 15, the 15q11-q13 region. It will be appreciatedthat several translocations and microdeletions can be asymptomatic inthe carrier parent, yet can cause a major genetic disease in theoffspring. For example, a healthy mother who carries the 15q11-q13microdeletion can give birth to a child with Angelman syndrome, a severeneurodegenerative disorder. Thus, the present invention can be used toidentify such a partial deletion and other deletions in the fetus.Partial monosomy 13q is a rare chromosomal disorder that results when apiece of the long arm (q) of chromosome 13 is missing (monosomic).Infants born with partial monosomy 13q may exhibit low birth weight,malformations of the head and face (craniofacial region), skeletalabnormalities (especially of the hands and feet), and other physicalabnormalities. Mental retardation is characteristic of this condition.The mortality rate during infancy is high among individuals born withthis disorder. Almost all cases of partial monosomy 13q occur randomlyfor no apparent reason (sporadic). 22q11.2 deletion syndrome, also knownas DiGeorge syndrome, is a syndrome caused by the deletion of a smallpiece of chromosome 22. The deletion (22 q11.2) occurs near the middleof the chromosome on the long arm of one of the pair of chromosome. Thefeatures of this syndrome vary widely, even among members of the samefamily, and affect many parts of the body. Characteristic signs andsymptoms may include birth defects such as congenital heart disease,defects in the palate, most commonly related to neuromuscular problemswith closure (velo-pharyngeal insufficiency), learning disabilities,mild differences in facial features, and recurrent infections.Microdeletions in chromosomal region 22q11.2 are associated with a 20 to30-fold increased risk of schizophrenia. In one embodiment, the methodof the invention is used to determine partial monosomies including butnot limited to monosomy 18p, partial monosomy of chromosome 15(15q11-q13), partial monosomy 13q, and partial monosomy of chromosome 22can also be determined using the method.

The method of the invention can be also used to determine any aneuploidyif one of the parents is a known carrier of such abnormality. Theseinclude, but not limited to, mosaic for a small supernumerary markerchromosome (SMC); t(11;14)(p15;p13) translocation; unbalancedtranslocation t(8;11)(p23.2;p15.5); 11q23 microdeletion; Smith-Magenissyndrome 17p11.2 deletion; 22q13.3 deletion; Xp22.3 microdeletion; 10p14deletion; 20p microdeletion, DiGeorge syndrome [del(22)(q11.2q11.23)],Williams syndrome (7q11.23 and 7q36 deletions); 1p36 deletion; 2pmicrodeletion; neurofibromatosis type 1 (17q11.2 microdeletion), Yqdeletion; Wolf-Hirschhorn syndrome (WHS, 4p16.3 microdeletion); 1p36.2microdeletion; 11q14 deletion; 19q13.2 microdeletion; Rubinstein-Taybi(16 p13.3 microdeletion); 7p21 microdeletion; Miller-Dieker syndrome(17p13.3), 17p11.2 deletion; and 2q37 microdeletion.

Determination of Complete Fetal Chromosomal Aneuploidies

In one embodiment, the present invention provides a method fordetermining the presence or absence of any one or more differentcomplete fetal chromosomal aneuploidies in a maternal test samplecomprising fetal and maternal nucleic acid molecules. Preferably, themethod determines the presence or absence of any four or more differentcomplete chromosomal aneuploidies. The steps of the method comprise (a)obtaining sequence information for the fetal and maternal nucleic acidsin the maternal test sample; and (b) using the sequence information toidentify a number of sequence tags for each of any one or morechromosomes of interest selected from chromosomes 1-22, X and Y and toidentify a number of sequence tags for a normalizing chromosome sequencefor each of the any one or more chromosomes of interest. The normalizingchromosome sequence can be a single chromosome, or it can be a group ofchromosomes selected from chromosomes 1-22, X, and Y. The method furtheruses in step (c) the number of sequence tags identified for each of theany one or more chromosomes of interest and the number of sequence tagsidentified for each normalizing chromosome sequence to calculate asingle chromosome dose for each of the any one or more chromosomes ofinterest; and (d) compares each of the single chromosome doses for eachof the any one or more chromosomes of interest to a threshold value foreach of the one or more chromosomes of interest, thereby determining thepresence or absence of any one or more complete different fetalchromosomal aneuploidies in the maternal test sample.

In some embodiments, step (c) comprises calculating a single chromosomedose for each chromosomes of interest as the ratio of the number ofsequence tags identified for each of the chromosomes of interest and thenumber of sequence tags identified for the normalizing chromosome foreach of the chromosomes of interest.

In other embodiments, step (c) comprises calculating a single chromosomedose for each of the chromosomes of interest as the ratio of the numberof sequence tags identified for each of the chromosomes of interest andthe number of sequence tags identified for the normalizing chromosomefor each of the chromosomes of interest. In other embodiments, step (c)comprises calculating a sequence tag ratio for a chromosome of interestby relating the number of sequence tags obtained for the chromosome ofinterest to the length of the chromosome of interest, and relating thenumber of tags for the corresponding normalizing chromosome sequence forthe chromosome of interest to the length of the normalizing chromosomesequence, and calculating a chromosome dose for the chromosome ofinterest as a ratio of the sequence tags density of the chromosome ofinterest and the sequence tag density for the normalizing sequence. Thecalculation is repeated for each of all chromosomes of interest. Steps(a)-(d) can be repeated for test samples from different maternalsubjects.

An example of the embodiment whereby four or more complete fetalchromosomal aneuploidies are determined in a maternal test samplecomprising a mixture of fetal and maternal cell-free DNA molecules,comprises: (a) sequencing at least a portion of cell-free DNA moleculesto obtain sequence information for the fetal and maternal cell-free DNAmolecules in the test sample; (b) using the sequence information toidentify a number of sequence tags for each of any twenty or morechromosomes of interest selected from chromosomes 1-22, X, and Y and toidentify a number of sequence tags for a normalizing chromosome for eachof the twenty or more chromosomes of interest; (c) using the number ofsequence tags identified for each of the twenty or more chromosomes ofinterest and the number of sequence tags identified for each thenormalizing chromosome to calculate a single chromosome dose for each ofthe twenty or more chromosomes of interest; and (d) comparing each ofthe single chromosome doses for each of the twenty or more chromosomesof interest to a threshold value for each of the twenty or morechromosomes of interest, and thereby determining the presence or absenceof any twenty or more different complete fetal chromosomal aneuploidiesin the test sample.

In another embodiment, the method for determining the presence orabsence of any one or more different complete fetal chromosomalaneuploidies in a maternal test sample as described above uses anormalizing segment sequence for determining the dose of the chromosomeof interest. In this instance, the method comprises (a) obtainingsequence information for said fetal and maternal nucleic acids in saidsample; (b) using said sequence information to identify a number ofsequence tags for each of any one or more chromosomes of interestselected from chromosomes 1-22, X and Y and to identify a number ofsequence tags for a normalizing segment sequence for each of said anyone or more chromosomes of interest. The normalizing segment sequencecan be a single segment of a chromosome or it can be a group of segmentsform one or more different chromosomes. The method further uses in step(c) the number of sequence tags identified for each of said any one ormore chromosomes of interest and said number of sequence tags identifiedfor said normalizing segment sequence to calculate a single chromosomedose for each of said any one or more chromosomes of interest; and (d)comparing each of said single chromosome doses for each of said any oneor more chromosomes of interest to a threshold value for each of saidone or more chromosomes of interest, and thereby determining thepresence or absence of one or more different complete fetal chromosomalaneuploidies in said sample.

In some embodiments, step (c) comprises calculating a single chromosomedose for each of said chromosomes of interest as the ratio of the numberof sequence tags identified for each of said chromosomes of interest andthe number of sequence tags identified for said normalizing segmentsequence for each of said chromosomes of interest.

In other embodiments, step (c) comprises calculating a sequence tagratio for a chromosome of interest by relating the number of sequencetags obtained for the chromosome of interest to the length of thechromosome of interest, and relating the number of tags for thecorresponding normalizing segment sequence for the chromosome ofinterest to the length of the normalizing segment sequence, andcalculating a chromosome dose for the chromosome of interest as a ratioof the sequence tags density of the chromosome of interest and thesequence tag density for the normalizing segment sequence. Thecalculation is repeated for each of all chromosomes of interest. Steps(a)-(d) can be repeated for test samples from different maternalsubjects.

A means for comparing chromosome doses of different sample sets isprovided by determining a normalized chromosome value (NCV), whichrelates the chromosome dose in a test sample to the mean of the of thecorresponding chromosome dose in a set of qualified samples. The NCV iscalculated as:

${NCV_{ij}} = \frac{x_{ij} - {\hat{\mu}}_{j}}{{\hat{\sigma}}_{j}}$

where {circumflex over (μ)}_(j) and {circumflex over (σ)}_(j) are theestimated mean and standard deviation, respectively, for the j-thchromosome dose in a set of qualified samples, and x_(ij) is theobserved j-th chromosome dose for test sample i.

In some embodiments, the presence or absence of at least one completefetal chromosomal aneuploidy is determined. In other embodiments, thepresence or absence of at least two, at least three, at least four, atleast five, at least six, at least seven, at least eight, at least nine,at least ten, at least eleven, at least twelve, at least thirteen, atleast fourteen, at least fifteen, at least sixteen, at least seventeen,at least eighteen, at least nineteen, at least twenty, at leasttwenty-one, at least twenty-two, at least twenty-three, or twenty-fourcomplete fetal chromosomal aneuploidies are determined in a sample,wherein twenty-two of the complete fetal chromosomal aneuploidiescorrespond to complete chromosomal aneuploidies of any one or more ofthe autosomes; the twenty-third and twenty fourth chromosomal aneuploidycorrespond to a complete fetal chromosomal aneuploidy of chromosomes Xand Y. As aneuploidies of sex chromosomes can comprise tetrasomies,pentasomies and other polysomies, the number of different completechromosomal aneuploidies that can be determined according to the presentmethod may be at least 24, at least 25, at least 26, at least 27, atleast 28, at least 29, or at least 30 complete chromosomal aneuploidies.Thus, the number of different complete fetal chromosomal aneuploidiesthat are determined is related to the number of chromosomes of interestthat are selected for analysis.

In one embodiment, determining the presence or absence of any one ormore different complete fetal chromosomal aneuploidies in a maternaltest sample as described above uses a normalizing segment sequence forone chromosome of interest, which is selected from chromosomes 1-22, X,and Y. In other embodiments, two or more chromosomes of interest areselected from any two or more of chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, or Y. In oneembodiment, any one or more chromosomes of interest are selected fromchromosomes 1-22, X, and Y comprise at least twenty chromosomes selectedfrom chromosomes 1-22, X, and Y, and wherein the presence or absence ofat least twenty different complete fetal chromosomal aneuploidies isdetermined. In other embodiments, any one or more chromosomes ofinterest selected from chromosomes 1-22, X, and Y is all of chromosomes1-22, X, and Y, and wherein the presence or absence of complete fetalchromosomal aneuploidies of all of chromosomes 1-22, X, and Y isdetermined. Complete different fetal chromosomal aneuploidies that canbe determined include complete chromosomal trisomies, completechromosomal monosomies and complete chromosomal polysomies. Examples ofcomplete fetal chromosomal aneuploidies include without limitationtrisomies of any one or more of the autosomes e.g. trisomy 2, trisomy 8,trisomy 9, trisomy 21, trisomy 13, trisomy 16, trisomy 18, trisomy 22;trisomies of the sex chromosomes e.g. 47,XXY, 47 XXX, and 47 XYY;tetrasomies of sex chromosomes e.g. 48,XXYY, 48,XXXY, 48XXXX, and48,XYYY; pentasomies of sex chromosomes e.g. 49,XXXYY 49,XXXXY,49,XXXXX, 49,XYYYY; and monosomy X. Other complete fetal chromosomalaneuploidies that can be determined according to the present method aredescribed below.

Determination of Partial Fetal Chromosomal Aneuploidies

In another embodiment, the invention provides a method for determiningthe presence or absence of any one or more different partial fetalchromosomal aneuploidies in a maternal test sample comprising fetal andmaternal nucleic acid molecules. The steps of the method comprise (a)obtaining sequence information for the fetal and maternal nucleic acidsin said sample; and (b) using the sequence information to identify anumber of sequence tags for each of any one or more segments of any oneor more chromosomes of interest selected from chromosomes 1-22, X, and Yand to identify a number of sequence tags for a normalizing segmentsequence for each of said any one or more segments of any one or morechromosomes of interest. The normalizing segment sequence can be asingle segment of a chromosome or it can be a group of segments form oneor more different chromosomes. The method further uses in step (c) thenumber of sequence tags identified for each of any one or more segmentsof any one or more chromosomes of interest and the number of sequencetags identified for the normalizing segment sequence to calculate asingle segment dose for each of any one or more segments of any one ormore chromosome of interest; and (d) comparing each of the singlechromosome doses for each of any one or more segments of any one or morechromosomes of interest to a threshold value for each of said any one ormore chromosomal segments of any one or more chromosome of interest, andthereby determining the presence or absence of one or more differentpartial fetal chromosomal aneuploidies in said sample.

In some embodiments, step (c) comprises calculating a single segmentdose for each of any one or more segments of any one or more chromosomesof interest as the ratio of the number of sequence tags identified foreach of any one or more segments of any one or more chromosomes ofinterest and the number of sequence tags identified for the normalizingsegment sequence for each of any one or more segments of any one or morechromosomes of interest.

In other embodiments, step (c) comprises calculating a sequence tagratio for a segment of interest by relating the number of sequence tagsobtained for the segment of interest to the length of the segment ofinterest, and relating the number of tags for the correspondingnormalizing segment sequence for the segment of interest to the lengthof the normalizing segment sequence, and calculating a segment dose forthe segment of interest as a ratio of the sequence tags density of thesegment of interest and the sequence tag density for the normalizingsegment sequence. The calculation is repeated for each of allchromosomes of interest. Steps (a)-(d) can be repeated for test samplesfrom different maternal subjects.

A means for comparing segment doses of different sample sets is providedby determining a normalized segment value (NSV), which relates thesegment dose in a test sample to the mean of the of the correspondingsegment dose in a set of qualified samples. The NSV is calculated as:

${NSV_{ij}} = \frac{x_{ij} - {\hat{\mu}}_{j}}{{\hat{\sigma}}_{j}}$

where {circumflex over (μ)}_(j) and {circumflex over (σ)}_(j) are theestimated mean and standard deviation, respectively, for the j-thsegment dose in a set of qualified samples, and x_(ij) is the observedj-th segment dose for test sample i.

In some embodiments, the presence or absence of one partial fetalchromosomal aneuploidy is determined. In other embodiments, the presenceor absence of two, three, four, five, six, seven, eight, nine, ten,fifteen, twenty, twenty-five, or more partial fetal chromosomalaneuploidies are determined in a sample. In one embodiment, one segmentof interest selected from any one of chromosomes 1-22, X, and Y isselected from chromosomes 1-22, X, and Y. In another embodiment, two ormore segments of interest selected from chromosomes 1-22, X, and Y areselected from any two or more of chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, or Y. In oneembodiment, any one or more segments of interest are selected fromchromosomes 1-22, X, and Y comprise at least one, five, ten, 15, 20, 25or more segments selected from chromosomes 1-22, X, and Y, and whereinthe presence or absence of at least one, five, ten, 15, 20, 25 differentpartial fetal chromosomal aneuploidies is determined. Different partialfetal chromosomal aneuploidies that can be determined include fetalchromosomal aneuploidies include partial duplications, partialmultiplications, partial insertions and partial deletions. Examples ofpartial fetal chromosomal aneuploidies include partial monosomies andpartial trisomies of autosomes. Partial monosomies of autosomes includepartial monosomy of chromosome 1, partial monosomy of chromosome 4,partial monosomy of chromosome 5, partial monosomy of chromosome 7,partial monosomy of chromosome 11, partial monosomy of chromosome 15,partial monosomy of chromosome 17, partial monosomy of chromosome 18,and partial monosomy of chromosome 22. Other partial fetal chromosomalaneuploidies that can be determined according to the present method aredescribed below.

In any one of the embodiments described above, the test sample is amaternal sample selected from blood, plasma, serum, urine and salivasamples. In some embodiments, the maternal test sample is a plasmasample. The nucleic acid molecules of the maternal sample are a mixtureof fetal and maternal cell-free DNA molecules. Sequencing of the nucleicacids can be performed using next generation sequencing (NGS) asdescribed elsewhere herein. In some embodiments, sequencing is massivelyparallel sequencing using sequencing-by-synthesis with reversible dyeterminators. In other embodiments, sequencing is sequencing-by-ligation.In yet other embodiments, sequencing is single molecule sequencing.Optionally, an amplification step is performed prior to sequencing.

Determination of CNV of Clinical Disorders

In addition to the early determination of birth defects, the methodsdescribed herein can be applied to the determination of any abnormalityin the representation of genetic sequences within the genome.

It has been shown that blood plasma and serum DNA from cancer patientscontains measurable quantities of tumor DNA, which can be recovered andused as surrogate source of tumor DNA, and tumors are characterized byaneuploidy, or inappropriate numbers of gene sequences or even entirechromosomes. The determination of a difference in the amount of a givensequence i.e. a sequence of interest, in a sample from an individual canthus be used in the diagnosis of a medical condition. In someembodiments, the method can be used to determine the presence or absenceof a chromosomal aneuploidy in a patient suspected or known to besuffering from cancer. The method can also be applied to determining thepresence or absence of the status of a disease; to determining thepresence or absence of nucleic acids of a pathogen e.g. virus; todetermining chromosomal abnormalities associated with graft versus hostdisease (GVHD), and to determining the contribution of individuals inforensic analyses.

Embodiments of the invention provide for a method to assess copy numbervariation of a sequence of interest e.g. a clinically-relevant sequence,in a test sample that comprises a mixture of nucleic acids derived fromtwo different genomes, and which are known or are suspected to differ inthe amount of one or more sequence of interest. The mixture of nucleicacids is derived from two or more types of cells. In one embodiment, themixture of nucleic acids is derived from normal and cancerous cellsderived from a subject suffering from a medical condition e.g. cancer.

The development of cancer is often accompanied by an alteration innumber of whole chromosomes i.e. complete chromosomal aneuploidy, and/oran alteration in the number of segments of chromosomes i.e. partialaneuploidy, caused by a process known as chromosome instability (CIN)(Thoma et al., Swiss Med Weekly 2011:141:w13170). It is believed thatmany solid tumors, such as breast cancer, progress from initiation tometastasis through the accumulation of several genetic aberrations.[Sato et al., Cancer Res., 50: 7184-7189 [1990]; Jongsma et al., J ClinPathol: Mol Path 55:305-309 [2002])]. Such genetic aberrations, as theyaccumulate, may confer proliferative advantages, genetic instability andthe attendant ability to evolve drug resistance rapidly, and enhancedangiogenesis, proteolysis and metastasis. The genetic aberrations mayaffect either recessive “tumor suppressor genes” or dominantly actingoncogenes. Deletions and recombination leading to loss of heterozygosity(LOH) are believed to play a major role in tumor progression byuncovering mutated tumor suppressor alleles.

cfDNA has been found in the circulation of patients diagnosed withmalignancies including but not limited to lung cancer (Pathak et al.Clin Chem 52:1833-1842 [2006]), prostate cancer (Schwartzenbach et al.Clin Cancer Res 15:1032-8 [2009]), and breast cancer (Schwartzenbach etal. available online at breast-cancer-research.com/content/11/5/R71[2009]). Identification of genomic instabilities associated with cancersthat can be determined in the circulating cfDNA in cancer patients is apotential diagnostic and prognostic tool. In one embodiment, the methodof the invention assesses CNV of a sequence of interest in a samplecomprising a mixture of nucleic acids derived from a subject that issuspected or is known to have cancer e.g. carcinoma, sarcoma, lymphoma,leukemia, germ cell tumors and blastoma. In one embodiment, the sampleis a plasma sample derived (processes) from peripheral blood and thatcomprises a mixture of cfDNA derived from normal and cancerous cells. Inanother embodiment, the biological sample that is needed to determinewhether a CNV is present is derived from a mixture of cancerous andnon-cancerous cells from other biological fluids including but notlimited to serum, sweat, tears, sputum, urine, sputum, ear flow, lymph,saliva, cerebrospinal fluid, ravages, bone marrow suspension, vaginalflow, transcervical lavage, brain fluid, ascites, milk, secretions ofthe respiratory, intestinal and genitourinary tracts, and leukophoresissamples, or in tissue biopsies, swabs, or smears. In other embodiments,the biological sample is a stool (fecal) sample.

The sequence of interest is a nucleic acid sequence that is known or issuspected to play a role in the development and/or progression of thecancer. Examples of a sequence of interest include nucleic acidssequences i.e. complete chromosomes and/or segments of chromosomes, thatare amplified or deleted in cancerous cells as described in thefollowing.

In one embodiment, the present method can be used to determine thepresence or absence of a chromosomal amplification. In some embodiments,the chromosomal amplification is the gain of one or more entirechromosomes. In other embodiments, the chromosomal amplification is thegain of one or more segments of a chromosome. In yet other embodiments,the chromosomal amplification is the gain of two or more segments of twoor more chromosomes. The chromosomal amplification can involve the gainof one or more oncogenes.

Dominantly acting genes associated with human solid tumors typicallyexert their effect by overexpression or altered expression. Geneamplification is a common mechanism leading to upregulation of geneexpression. Evidence from cytogenetic studies indicates that significantamplification occurs in over 50% of human breast cancers. Most notably,the amplification of the proto-oncogene human epidermal growth factorreceptor 2 (HER2) located on chromosome 17 (17(17q21-q22)), results inoverexpression of HER2 receptors on the cell surface leading toexcessive and dysregulated signaling in breast cancer and othermalignancies (Park et al., Clinical Breast Cancer 8:392-401 [2008]). Avariety of oncogenes have been found to be amplified in other humanmalignancies. Examples of the amplification of cellular oncogenes inhuman tumors include amplifications of: c-myc in promyelocytic leukemiacell line HL60, and in small-cell lung carcinoma cell lines, N-myc inprimary neuroblastomas (stages III and IV), neuroblastoma cell lines,retinoblastoma cell line and primary tumors, and small-cell lungcarcinoma lines and tumors, L-myc in small-cell lung carcinoma celllines and tumors, c-myb in acute myeloid leukemia and in colon carcinomacell lines, c-erbb in epidermoid carcinoma cell, and primary gliomas,c-K-ras-2 in primary carcinomas of lung, colon, bladder, and rectum,N-ras in mammary carcinoma cell line (Varmus H., Ann Rev Genetics 18:553-612 (1984) [cited in Watson et al., Molecular Biology of the Gene(4th ed.; Benjamin/Cummings Publishing Co. 1987)].

In one embodiment, the present method can be used to determine thepresence or absence of a chromosomal deletion. In some embodiments, thechromosomal deletion is the loss of one or more entire chromosomes. Inother embodiments, the chromosomal deletion is the loss of one or moresegments of a chromosome. In yet other embodiments, the chromosomaldeletion is the loss of two or more segments of two or more chromosomes.The chromosomal deletion can involve the loss of one or more tumorsuppressor genes.

Chromosomal deletions involving tumor suppressor genes may play animportant role in the development and progression of solid tumors. Theretinoblastoma tumor suppressor gene (Rb-1), located in chromosome13q14, is the most extensively characterized tumor suppressor gene. TheRb-1 gene product, a 105 kDa nuclear phosphoprotein, apparently plays animportant role in cell cycle regulation (Howe et al., Proc Natl Acad Sci(USA) 87:5883-5887 [1990]). Altered or lost expression of the Rb proteinis caused by inactivation of both gene alleles either through a pointmutation or a chromosomal deletion. Rb-i gene alterations have beenfound to be present not only in retinoblastomas but also in othermalignancies such as osteosarcomas, small cell lung cancer (Rygaard etal., Cancer Res 50: 5312-5317 [1990)]) and breast cancer. Restrictionfragment length polymorphism (RFLP) studies have indicated that suchtumor types have frequently lost heterozygosity at 13q suggesting thatone of the Rb-1 gene alleles has been lost due to a gross chromosomaldeletion (Bowcock et al., Am J Hum Genet, 46: 12 [1990]). Chromosome 1abnormalities including duplications, deletions and unbalancedtranslocations involving chromosome 6 and other partner chromosomesindicate that regions of chromosome 1, in particular 1q21-1q32 and1p11-13, might harbor oncogenes or tumor suppressor genes that arepathogenetically relevant to both chronic and advanced phases ofmyeloproliferative neoplasms (Caramazza et al., Eur J Hematol 84:191-200[2010]). Myeloproliferative neoplasms are also associated with deletionsof chromosome 5. Complete loss or interstitial deletions of chromosome 5are the most common karyotypic abnormality in myelodysplastic syndromes(MDSs). Isolated del(5q)/5q-MDS patients have a more favorable prognosisthan those with additional karyotypic defects, who tend to developmyeloproliferative neoplasms (MPNs) and acute myeloid leukemia. Thefrequency of unbalanced chromosome 5 deletions has led to the idea that5q harbors one or more tumor-suppressor genes that have fundamentalroles in the growth control of hematopoietic stem/progenitor cells(HSCs/HPCs). Cytogenetic mapping of commonly deleted regions (CDRs)centered on 5q31 and 5q32 identified candidate tumor-suppressor genes,including the ribosomal subunit RPS14, the transcription factorEgr1/Krox20 and the cytoskeletal remodeling protein, alpha-catenin(Eisenmann et al., Oncogene 28:3429-3441 [2009]). Cytogenetic andallelotyping studies of fresh tumors and tumor cell lines have shownthat allelic loss from several distinct regions on chromosome 3p,including 3p25, 3p21-22, 3p21.3, 3p12-13 and 3p14, are the earliest andmost frequent genomic abnormalities involved in a wide spectrum of majorepithelial cancers of lung, breast, kidney, head and neck, ovary,cervix, colon, pancreas, esophagus, bladder and other organs. Severaltumor suppressor genes have been mapped to the chromosome 3p region, andare thought that interstitial deletions or promoter hypermethylationprecede the loss of the 3p or the entire chromosome 3 in the developmentof carcinomas (Angeloni D., Briefings Functional Genomics 6:19-39[2007]). Newborns and children with Down syndrome (DS) often presentwith congenital transient leukemia and have an increased risk of acutemyeloid leukemia and acute lymphoblastic leukemia. Chromosome 21,harboring about 300 genes, may be involved in numerous structuralaberrations, e.g., translocations, deletions, and amplifications, inleukemias, lymphomas, and solid tumors. Moreover, genes located onchromosome 21 have been identified that play an important role intumorigenesis. Somatic numerical as well as structural chromosome 21aberrations are associated with leukemias, and specific genes includingRUNX1, TMPRSS2, and TFF, which are located in 21q, play a role intumorigenesis (Fonatsch C Gene Chromosomes Cancer 49:497-508 [2010]).

In one embodiment, the method provides a means to assess the associationbetween gene amplification and the extent of tumor evolution.Correlation between amplification and/or deletion and stage or grade ofa cancer may be prognostically important because such information maycontribute to the definition of a genetically based tumor grade thatwould better predict the future course of disease with more advancedtumors having the worst prognosis. In addition, information about earlyamplification and/or deletion events may be useful in associating thoseevents as predictors of subsequent disease progression. Geneamplification and deletions as identified by the method can beassociated with other known parameters such as tumor grade, histology,Brd/Urd labeling index, hormonal status, nodal involvement, tumor size,survival duration and other tumor properties available fromepidemiological and biostatistical studies. For example, tumor DNA to betested by the method could include atypical hyperplasia, ductalcarcinoma in situ, stage I-III cancer and metastatic lymph nodes inorder to permit the identification of associations betweenamplifications and deletions and stage. The associations made may makepossible effective therapeutic intervention. For example, consistentlyamplified regions may contain an overexpressed gene, the product ofwhich may be able to be attacked therapeutically (for example, thegrowth factor receptor tyrosine kinase, p185^(HER2)).

The method can be used to identify amplification and/or deletion eventsthat are associated with drug resistance by determining the copy numbervariation of nucleic acid sequences from primary cancers to those ofcells that have metastasized to other sites. If gene amplificationand/or deletion is a manifestation of karyotypic instability that allowsrapid development of drug resistance, more amplification and/or deletionin primary tumors from chemoresistant patients than in tumors inchemosensitive patients would be expected. For example, if amplificationof specific genes is responsible for the development of drug resistance,regions surrounding those genes would be expected to be amplifiedconsistently in tumor cells from pleural effusions of chemoresistantpatients but not in the primary tumors. Discovery of associationsbetween gene amplification and/or deletion and the development of drugresistance may allow the identification of patients that will or willnot benefit from adjuvant therapy.

In a manner similar to that described for determining the presence orabsence of complete and/or partial fetal chromosomal aneuploidies in amaternal sample, the method of the invention can be used to determinethe presence or absence of complete and/or partial chromosomalaneuploidies in any patient sample comprising nucleic acids e.g. DNA orcfDNA (including patient samples that are not maternal samples). Thepatient sample can be any biological sample type as described elsewhereherein. Preferably, the sample is obtained by non-invasive procedures.For example, the sample can be a blood sample, or the serum and plasmafractions thereof. Alternatively, the sample can be a urine sample or afecal sample. In yet other embodiments, the sample is a tissue biopsysample. In all cases, the sample comprises nucleic acids e.g. cfDNA orgenomic DNA, which is purified, and sequenced using any of the NGSsequencing methods described previously.

Both complete and partial chromosomal aneuploidies associated with theformation, and progression of cancer can be determined according to thepresent method.

Determination of Complete Chromosomal Aneuploidies in Patient Samples

In one embodiment, the present invention provides a method fordetermining the presence or absence of any one or more differentcomplete chromosomal aneuploidies in a patient test sample comprisingnucleic acid molecules. In some embodiments, the method determines thepresence or absence of any one or more different complete chromosomalaneuploidies. The steps of the method comprise (a) obtaining sequenceinformation for the patient nucleic acids in the patient test sample;and (b) using the sequence information to identify a number of sequencetags for each of any one or more chromosomes of interest selected fromchromosomes 1-22, X and Y and to identify a number of sequence tags fora normalizing chromosome sequence for each of the any one or morechromosomes of interest. The normalizing chromosome sequence can be asingle chromosome, or it can be a group of chromosomes selected fromchromosomes 1-22, X, and Y. The method further uses in step (c) thenumber of sequence tags identified for each of the any one or morechromosomes of interest and the number of sequence tags identified foreach normalizing chromosome sequence to calculate a single chromosomedose for each of the any one or more chromosomes of interest; and (d)compares each of the single chromosome doses for each of the any one ormore chromosomes of interest to a threshold value for each of the one ormore chromosomes of interest, thereby determining the presence orabsence of any one or more different complete patient chromosomalaneuploidies in the patient test sample.

In some embodiments, step (c) comprises calculating a single chromosomedose for each chromosomes of interest as the ratio of the number ofsequence tags identified for each of the chromosomes of interest and thenumber of sequence tags identified for the normalizing chromosome foreach of the chromosomes of interest.

In other embodiments, step (c) comprises calculating a single chromosomedose for each of the chromosomes of interest as the ratio of the numberof sequence tags identified for each of the chromosomes of interest andthe number of sequence tags identified for the normalizing chromosomefor each of the chromosomes of interest. In other embodiments, step (c)comprises calculating a sequence tag ratio for a chromosome of interestby relating the number of sequence tags obtained for the chromosome ofinterest to the length of the chromosome of interest, and relating thenumber of tags for the corresponding normalizing chromosome sequence forthe chromosome of interest to the length of the normalizing chromosomesequence, and calculating a chromosome dose for the chromosome ofinterest as a ratio of the sequence tags density of the chromosome ofinterest and the sequence tag density for the normalizing sequence. Thecalculation is repeated for each of all chromosomes of interest. Steps(a)-(d) can be repeated for test samples from different patients.

An example of the embodiment whereby one or more complete chromosomalaneuploidies are determined in a cancer patient test sample comprisingcell-free DNA molecules, comprises: (a) sequencing at least a portion ofcell-free DNA molecules to obtain sequence information for the patientcell-free DNA molecules in the test sample; (b) using the sequenceinformation to identify a number of sequence tags for each of any twentyor more chromosomes of interest selected from chromosomes 1-22, X, and Yand to identify a number of sequence tags for a normalizing chromosomefor each of the twenty or more chromosomes of interest; (c) using thenumber of sequence tags identified for each of the twenty or morechromosomes of interest and the number of sequence tags identified foreach the normalizing chromosome to calculate a single chromosome dosefor each of the twenty or more chromosomes of interest; and (d)comparing each of the single chromosome doses for each of the twenty ormore chromosomes of interest to a threshold value for each of the twentyor more chromosomes of interest, and thereby determining the presence orabsence of any twenty or more different complete chromosomalaneuploidies in the patient test sample.

In another embodiment, the method for determining the presence orabsence of any one or more different complete chromosomal aneuploidiesin a patient test sample as described above uses a normalizing segmentsequence for determining the dose of the chromosome of interest. In thisinstance, the method comprises (a) obtaining sequence information forthe nucleic acids in the sample; (b) using the sequence information toidentify a number of sequence tags for each of any one or morechromosomes of interest selected from chromosomes 1-22, X and Y and toidentify a number of sequence tags for a normalizing segment sequencefor each of any one or more chromosomes of interest. The normalizingsegment sequence can be a single segment of a chromosome or it can be agroup of segments form one or more different chromosomes. The methodfurther uses in step (c) the number of sequence tags identified for eachof said any one or more chromosomes of interest and said number ofsequence tags identified for said normalizing segment sequence tocalculate a single chromosome dose for each of said any one or morechromosomes of interest; and (d) comparing each of said singlechromosome doses for each of said any one or more chromosomes ofinterest to a threshold value for each of said one or more chromosomesof interest, and thereby determining the presence or absence of one ormore different complete chromosomal aneuploidies in the patient sample.

In some embodiments, step (c) comprises calculating a single chromosomedose for each of said chromosomes of interest as the ratio of the numberof sequence tags identified for each of said chromosomes of interest andthe number of sequence tags identified for said normalizing segmentsequence for each of said chromosomes of interest.

In other embodiments, step (c) comprises calculating a sequence tagratio for a chromosome of interest by relating the number of sequencetags obtained for the chromosome of interest to the length of thechromosome of interest, and relating the number of tags for thecorresponding normalizing segment sequence for the chromosome ofinterest to the length of the normalizing segment sequence, andcalculating a chromosome dose for the chromosome of interest as a ratioof the sequence tags density of the chromosome of interest and thesequence tag density for the normalizing segment sequence. Thecalculation is repeated for each of all chromosomes of interest. Steps(a)-(d) can be repeated for test samples from different patients.

A means for comparing chromosome doses of different sample sets isprovided by determining a normalized chromosome value (NCV), whichrelates the chromosome dose in a test sample to the mean of the of thecorresponding chromosome dose in a set of qualified samples. The NCV iscalculated as:

${NCV_{ij}} = \frac{x_{ij} - {\hat{\mu}}_{j}}{{\hat{\sigma}}_{j}}$

where {circumflex over (μ)}_(j) and {circumflex over (σ)}_(j) are theestimated mean and standard deviation, respectively, for the j-thchromosome dose in a set of qualified samples, and x_(ij) is theobserved j-th chromosome dose for test sample i.

In some embodiments, the presence or absence of one complete chromosomalaneuploidy is determined. In other embodiments, the presence or absenceof two, three, four, five, six, seven, eight, nine, ten, eleven, twelve,thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen,twenty, twenty-one, twenty-two, twenty-three, or twenty four completechromosomal aneuploidies are determined in a sample, wherein twenty-twoof the complete chromosomal aneuploidies correspond to completechromosomal aneuploidies of any one or more of the autosomes; thetwenty-third and twenty fourth chromosomal aneuploidy correspond to acomplete chromosomal aneuploidy of chromosomes X and Y. As aneuploidiescan comprise trisomies, tetrasomies, pentasomies and other polysomies,and the number of complete chromosomal aneuploidies varies in differentdiseases and in different stages of the same disease, the number ofcomplete chromosomal aneuploidies that are determined according to thepresent method are at least 24, at least 25, at least 26, at least 27,at least 28, at least 29, at least 30 complete, at least 40, at least50, at least 60, at least 70, at least 80, at least 90, at least 100 ormore chromosomal aneuploidies. Systematic karyotyping of tumors hasrevealed that the chromosome number in cancer cells is highly variable,ranging from hypodiploidy (considerably fewer than 46 chromosomes) totetraploidy and hypertetraploidy (up to 200 chromosomes) (Storchova andKuffer J Cell Sci 121:3859-3866 [2008]). In some embodiments, the methodcomprises determining the presence or absence of up to 200 or morechromosomal aneuploidies in a sample form a patient suspected or knownto be suffering from cancer e.g. colon cancer. The chromosomalaneuploidies include losses of one or more complete chromosomes(hypodiploidies), gains of complete chromosomes including trisomies,tetrasomies, pentasomies, and other polysomies. Gains and/or losses ofsegments of chromosomes can also be determined as described elsewhereherein. The method is applicable to determining the presence or absenceof different aneuploidies in samples from patients suspected or known tobe suffering from any cancer as described elsewhere herein.

In some embodiments, any one of chromosomes 1-22, X and Y, can be thechromosome of interest in determining the presence or absence of any oneor more different complete chromosomal aneuploidies in a patient testsample as described above. In other embodiments, two or more chromosomesof interest are selected from any two or more of chromosomes 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, orY. In one embodiment, any one or more chromosomes of interest areselected from chromosomes 1-22, X, and Y comprise at least twentychromosomes selected from chromosomes 1-22, X, and Y, and wherein thepresence or absence of at least twenty different complete chromosomalaneuploidies is determined. In other embodiments, any one or morechromosomes of interest selected from chromosomes 1-22, X, and Y is allof chromosomes 1-22, X, and Y, and wherein the presence or absence ofcomplete chromosomal aneuploidies of all of chromosomes 1-22, X, and Yis determined. Complete different chromosomal aneuploidies that can bedetermined include complete chromosomal monosomies of any one or more ofchromosomes 1-22, X and Y; complete chromosomal trisomies of any one ormore of chromosomes 1-22, X and Y; complete chromosomal tetrasomies ofany one or more of chromosomes 1-22, X and Y; complete chromosomalpentasomies of any one or more of chromosomes 1-22, X and Y; and othercomplete chromosomal polysomies of any one or more of chromosomes 1-22,X and Y.

Determination of Partial Chromosomal Aneuploidies in Patient Samples

In another embodiment, the invention provides a method for determiningthe presence or absence of any one or more different partial chromosomalaneuploidies in a patient test sample comprising nucleic acid molecules.The steps of the method comprise (a) obtaining sequence information forthe patient nucleic acids in the sample; and (b) using the sequenceinformation to identify a number of sequence tags for each of any one ormore segments of any one or more chromosomes of interest selected fromchromosomes 1-22, X, and Y and to identify a number of sequence tags fora normalizing segment sequence for each of any one or more segments ofany one or more chromosomes of interest. The normalizing segmentsequence can be a single segment of a chromosome or it can be a group ofsegments form one or more different chromosomes. The method further usesin step (c) the number of sequence tags identified for each of any oneor more segments of any one or more chromosomes of interest and thenumber of sequence tags identified for the normalizing segment sequenceto calculate a single segment dose for each of any one or more segmentsof any one or more chromosome of interest; and (d) comparing each of thesingle chromosome doses for each of any one or more segments of any oneor more chromosomes of interest to a threshold value for each of saidany one or more chromosomal segments of any one or more chromosome ofinterest, and thereby determining the presence or absence of one or moredifferent partial chromosomal aneuploidies in said sample.

In some embodiments, step (c) comprises calculating a single segmentdose for each of any one or more segments of any one or more chromosomesof interest as the ratio of the number of sequence tags identified foreach of any one or more segments of any one or more chromosomes ofinterest and the number of sequence tags identified for the normalizingsegment sequence for each of any one or more segments of any one or morechromosomes of interest.

In other embodiments, step (c) comprises calculating a sequence tagratio for a segment of interest by relating the number of sequence tagsobtained for the segment of interest to the length of the segment ofinterest, and relating the number of tags for the correspondingnormalizing segment sequence for the segment of interest to the lengthof the normalizing segment sequence, and calculating a segment dose forthe segment of interest as a ratio of the sequence tags density of thesegment of interest and the sequence tag density for the normalizingsegment sequence. The calculation is repeated for each of allchromosomes of interest. Steps (a)-(d) can be repeated for test samplesfrom different patients.

A means for comparing segment doses of different sample sets is providedby determining a normalized segment value (NSV), which relates thesegment dose in a test sample to the mean of the of the correspondingsegment dose in a set of qualified samples. The NSV is calculated as:

${NSV_{ij}} = \frac{x_{ij} - {\hat{\mu}}_{j}}{{\hat{\sigma}}_{j}}$

where {circumflex over (μ)}_(j) and {circumflex over (σ)}_(j) are theestimated mean and standard deviation, respectively, for the j-thsegment dose in a set of qualified samples, and x_(ij) is the observedj-th segment dose for test sample i.

In some embodiments, the presence or absence of one partial chromosomalaneuploidy is determined. In other embodiments, the presence or absenceof two, three, four, five, six, seven, eight, nine, ten, fifteen,twenty, twenty-five, or more partial chromosomal aneuploidies aredetermined in a sample. In one embodiment, one segment of interestselected from any one of chromosomes 1-22, X, and Y is selected fromchromosomes 1-22, X, and Y. In another embodiment, two or more segmentsof interest selected from chromosomes 1-22, X, and Y are selected fromany two or more of chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, or Y. In one embodiment, anyone or more segments of interest are selected from chromosomes 1-22, X,and Y comprise at least one, five, ten, 15, 20, 25, 50, 75, 100 or moresegments selected from chromosomes 1-22, X, and Y, and wherein thepresence or absence of at least one, five, ten, 15, 20, 25, 50, 75, 100,or more different partial chromosomal aneuploidies is determined.Different partial chromosomal aneuploidies that can be determinedinclude chromosomal aneuploidies include partial duplications, partialmultiplications, partial insertions and partial deletions.

Samples that can be used for determining the presence or absence of achromosomal aneuploidy (partial or complete) in a patient can be any ofthe biological samples described elsewhere herein. The type of sample orsamples that can be used for the determination of aneuploidy in apatient will depend on the type of disease from which the patient isknown or suspected to be suffering. For example, a stool sample can bechosen as a source of DNA to determine the presence or absence ofaneuploidies associated with colorectal cancer. The method is alsoapplicable to tissue samples as described herein. Preferably, the sampleis a biological sample that is obtained by non-invasive means e.g. aplasma sample. As described elsewhere herein, sequencing of the nucleicacids in the patient sample can be performed using next generationsequencing (NGS) as described elsewhere herein. In some embodiments,sequencing is massively parallel sequencing usingsequencing-by-synthesis with reversible dye terminators. In otherembodiments, sequencing is sequencing-by-ligation. In yet otherembodiments, sequencing is single molecule sequencing. Optionally, anamplification step is performed prior to sequencing.

In some embodiments, the presence or absence of an aneuploidy isdetermined in a patient suspected to be suffering from a cancer asdescribed elsewhere herein e.g. lung, breast, kidney, head and neck,ovary, cervix, colon, pancreas, esophagus, bladder and other organs, andblood cancers. Blood cancers include cancers of the bone marrow, blood,and lymphatic system, which includes lymph nodes, lymphatic vessels,tonsils, thymus, spleen, and digestive tract lymphoid tissue. Leukemiaand myeloma, which start in the bone marrow, and lymphoma, which startsin the lymphatic system, are the most common types of blood cancer.

The determination of the presence or absence of one or more chromosomalaneuploidies in a patient sample can be made without limitation todetermine the predisposition of the patient to a particular cancer, todetermine the presence or absence of a cancer as part of routine screenin patients known and not known to be predisposed to the cancer inquestion, to provide a prognosis for the disease, to assess the need foradjuvant therapy, and to determine the progress or regress of thediseases.

Apparatus and Systems for Determining CNV

Analysis of the sequencing data and the diagnosis derived therefrom aretypically performed using various computer algorithms and programs. Inone embodiment, the invention provides a computer program product forgenerating an output indicating the presence or absence of a fetalaneuploidy in a test sample. The computer product comprises a computerreadable medium having a computer executable logic recorded thereon forenabling a processor to diagnose a fetal aneuploidy comprising: areceiving procedure for receiving sequencing data from at least aportion of nucleic acid molecules from a maternal biological sample,wherein said sequencing data comprises a calculated chromosome; computerassisted logic for analyzing a fetal aneuploidy from said received data;and an output procedure for generating an output indicating thepresence, absence or kind of said fetal aneuploidy.

The method of the invention can be performed using a computer-readablemedium having stored thereon computer-readable instructions for carryingout a method for identifying any CNV e.g. chromosomal or partialaneuploidies. Thus, in one embodiment, the invention provides acomputer-readable medium having stored thereon computer-readableinstructions for carrying out a method for identifying complete andpartial chromosomal aneuploidies e.g. fetal aneuploidies.

The method of the invention can also be performed using a computerprocessing system which is adapted or configured to perform a method foridentifying any CNV e.g. chromosomal or partial aneuploidies. Thus, inone embodiment, the invention provides a computer processing systemwhich is adapted or configured to perform a method as described herein.In one embodiment, the apparatus comprises a sequencing device adaptedor configured for sequencing at least a portion of the nucleic acidmolecules in a sample to obtain the type of sequence informationdescribed elsewhere herein.

The present invention is described in further detail in the followingExamples which are not in any way intended to limit the scope of theinvention as claimed. The attached Figures are meant to be considered asintegral parts of the specification and description of the invention.The following examples are offered to illustrate, but not to limit theclaimed invention.

EXPERIMENTAL Example 1 Sample Processing and DNA Extraction

Peripheral blood samples were collected from pregnant women in theirfirst or second trimester of pregnancy and who were deemed at risk forfetal aneuploidy. Informed consent was obtained from each participantprior to the blood draw. Blood was collected before amniocentesis orchorionic villus sampling. Karyotype analysis was performed using thechorionic villus or amniocentesis samples to confirm fetal karyotype.

Peripheral blood drawn from each subject was collected in ACD tubes. Onetube of blood sample (approximately 6-9 mL/tube) was transferred intoone 15-mL low speed centrifuge tube. Blood was centrifuged at 2640 rpm,4° C. for 10 min using Beckman Allegra 6 R centrifuge and rotor model GA3.8. For cell-free plasma extraction, the upper plasma layer wastransferred to a 15-ml high speed centrifuge tube and centrifuged at16000×g, 4° C. for 10 min using Beckman Coulter Avanti J-E centrifuge,and JA-14 rotor. The two centrifugation steps were performed within 72 hafter blood collection. Cell-free plasma was stored at −80° C. andthawed only once before DNA extraction.

Cell-free DNA was extracted from cell-free plasma by using QIAamp DNABlood Mini kit (Qiagen) according to the manufacturer's instructions.Five milliliters of buffer AL and 500 μl of Qiagen Protease were addedto 4.5 ml-5 ml of cell-free plasma. The volume was adjusted to 10 mlwith phosphate buffered saline (PBS), and the mixture was incubated at56° C. for 12 minutes. Multiple columns were used to separate theprecipitated cfDNA from the solution by centrifugation at 8,000 RPM in aBeckman microcentrifuge. The columns were washed with AW1 and AW2buffers, and the cfDNA was eluted with 55 μl of nuclease-free water.Approximately 3.5-7 ng of cfDNA was extracted from the plasma samples.

All sequencing libraries were prepared from approximately 2 ng ofpurified cfDNA that was extracted from maternal plasma. Librarypreparation was performed using reagents of the NEBNext™ DNA Sample PrepDNA Reagent Set 1 (Part No. E6000L; New England Biolabs, Ipswich,Mass.), for Illumina® as follows. Because cell-free plasma DNA isfragmented in nature, no further fragmentation by nebulization orsonication was done on the plasma DNA samples. The overhangs ofapproximately 2 ng purified cfDNA fragments contained in 40 μl wereconverted into phosphorylated blunt ends according to the NEBNext® EndRepair Module by incubating in a 1.5 ml microfuge tube the cfDNA with 5μl 10× phosphorylation buffer, 2 μl deoxynucleotide solution mix (10 mMeach dNTP), 1 μl of a 1:5 dilution of DNA Polymerase I, 1 μl T4 DNAPolymerase and 1 μl T4 Polynucleotide Kinase provided in the NEBNext™DNA Sample Prep DNA Reagent Set 1 for 15 minutes at 20° C. The enzymeswere then heat inactivated by incubating the reaction mixture at 75° C.for 5 minutes. The mixture was cooled to 4° C., and dA tailing of theblunt-ended DNA was accomplished using 10 μl of the dA-tailing mastermix containing the Klenow fragment (3′ to 5′ exo minus) (NEBNext™ DNASample Prep DNA Reagent Set 1), and incubating for 15 minutes at 37° C.Subsequently, the Klenow fragment was heat inactivated by incubating thereaction mixture at 75° C. for 5 minutes. Following the inactivation ofthe Klenow fragment, 1 μl of a 1:5 dilution of Illumina Genomic AdaptorOligo Mix (Part No. 1000521; Illumina Inc., Hayward, Calif.) was used toligate the Illumina adaptors (Non-Index Y-Adaptors) to the dA-tailed DNAusing 4 μl of the T4 DNA ligase provided in the NEBNext™ DNA Sample PrepDNA Reagent Set 1, by incubating the reaction mixture for 15 minutes at25° C. The mixture was cooled to 4° C., and the adaptor-ligated cfDNAwas purified from unligated adaptors, adaptor dimers, and other reagentsusing magnetic beads provided in the Agencourt AMPure XP PCRpurification system (Part No. A63881; Beckman Coulter Genomics, Danvers,Mass.). Eighteen cycles of PCR were performed to selectively enrichadaptor-ligated cfDNA using Phusion® High-Fidelity Master Mix(Finnzymes, Woburn, Mass.) and Illumina's PCR primers complementary tothe adaptors (Part No. 1000537 and 1000537). The adaptor-ligated DNA wassubjected to PCR (98° C. for 30 seconds; 18 cycles of 98° C. for 10seconds, 65° C. for 30 seconds, and 72° C. for 30 seconds; finalextension at 72° C. for 5 minutes, and hold at 4° C.) using IlluminaGenomic PCR Primers (Part Nos. 100537 and 1000538) and the Phusion HFPCR Master Mix provided in the NEBNext™ DNA Sample Prep DNA Reagent Set1, according to the manufacturer's instructions. The amplified productwas purified using the Agencourt AMPure XP PCR purification system(Agencourt Bioscience Corporation, Beverly, Mass.) according to themanufacturer's instructions available on the worldwide web atbeckmangenomics.com/products/AMPureXPProtocol_000387v001.pdf. Thepurified amplified product was eluted in 40 μl of Qiagen EB Buffer, andthe concentration and size distribution of the amplified libraries wasanalyzed using the Agilent DNA 1000 Kit for the 2100 Bioanalyzer(Agilent technologies Inc., Santa Clara, Calif.).

The amplified DNA was sequenced using Illumina's Genome Analyzer II toobtain single-end reads of 36 bp. Only about 30 bp of random sequenceinformation are needed to identify a sequence as belonging to a specifichuman chromosome. Longer sequences can uniquely identify more particulartargets. In the present case, a large number of 36 bp reads wereobtained, covering approximately 10% of the genome. Upon completion ofsequencing of the sample, the Illumina “Sequencer Control Software”transferred image and base call files to a Unix server running theIllumina “Genome Analyzer Pipeline” software version 1.51. The Illumina“Gerald” program was run to align sequences to the reference humangenome that is derived from the hg18 genome provided by National Centerfor Biotechnology Information (NCBI36/hg18, available on the worldwideweb atgenome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg18&hgsid=166260105).The sequence data generated from the above procedure that uniquelyaligned to the genome was read from Gerald output (export.txt files) bya program (c2c.pl) running on a computer running the Linnux operatingsystem. Sequence alignments with base mis-matches were allowed andincluded in alignment counts only if they aligned uniquely to thegenome. Sequence alignments with identical start and end coordinates(duplicates) were excluded.

Between about 5 and 15 million 36 bp tags with 2 or less mismatches weremapped uniquely to the human genome. All mapped tags were counted andincluded in the calculation of chromosome doses in both test andqualifying samples. Regions extending from base 0 to base 2×10⁶, base10×10⁶ to base 13×10⁶, and base 23×10⁶ to the end of chromosome Y, werespecifically excluded from the analysis because tags derived from eithermale or female fetuses map to these regions of the Y-chromosome. It wasnoted that some variation in the total number of sequence tags mapped toindividual chromosomes across samples sequenced in the same run(inter-chromosomal variation), but substantially greater variation wasnoted to occur among different sequencing runs (inter-sequencing runvariation).

Example 2 Dose and Variance for Chromosomes 13, 18, 21, X, and Y

To examine the extent of inter-chromosomal and inter-sequencingvariation in the number of mapped sequence tags for all chromosomes,plasma cfDNA obtained from peripheral blood of 48 volunteer pregnantsubjects was extracted and sequenced as described in Example 1, andanalyzed as follows.

The total number of sequence tags that were mapped to each chromosome(sequence tag density) was determined. Alternatively, the number ofmapped sequence tags may be normalized to the length of the chromosometo generate a sequence tag density ratio. The normalization tochromosome length is not a required step, and can be performed solely toreduce the number of digits in a number to simplify it for humaninterpretation. Chromosome lengths that can be used to normalize thesequence tags counts can be the lengths provided on the world wide webat genome.ucsc.edu/goldenPath/stats.html#hg18.

The resulting sequence tag density for each chromosome was related tothe sequence tag density of each of the remaining chromosomes to derivea qualified chromosome dose, which was calculated as the ratio of thesequence tag density for the chromosome of interest e.g. chromosome 21,and the sequence tag density of each of the remaining chromosomes i.e.chromosomes 1-20, 22 and X. Table 1 provides an example of thecalculated qualified chromosome dose for chromosomes of interest 13, 18,21, X, and Y, determined in one of the qualified samples. Chromosomesdoses were determined for all chromosomes in all samples, and theaverage doses for chromosomes of interest 13, 18, 21, X and Y in thequalified samples are provided in Tables 2 and 3, and depicted in FIGS.2-6. FIGS. 2-6 also depict the chromosome doses for the test samples.The chromosome doses for each of the chromosomes of interest in thequalified samples provides a measure of the variation in the totalnumber of mapped sequence tags for each chromosome of interest relativeto that of each of the remaining chromosomes. Thus, qualified chromosomedoses can identify the chromosome or a group of chromosomes i.e.normalizing chromosome that has a variation among samples that isclosest to the variation of the chromosome of interest, and that wouldserve as ideal sequences for normalizing values for further statisticalevaluation. FIGS. 7 and 8 depict the calculated average chromosome dosesdetermined in a population of qualified samples for chromosomes 13, 18,and 21, and chromosomes X and Y.

In some instances, the best normalizing chromosome may not have theleast variation, but may have a distribution of qualified doses thatbest distinguishes a test sample or samples from the qualified samplesi.e. the best normalizing chromosome may not have the lowest variation,but may have the greatest differentiability. Thus, differentiabilityaccounts for the variation in chromosome dose and the distribution ofthe doses in the qualified samples.

Tables 2 and 3 provide the coefficient of variation as the measure ofvariability, and student t-test values as a measure of differentiabilityfor chromosomes 18, 21, X and Y, wherein the smaller the T-test value,the greatest the differentiability. The differentiability for chromosome13 was determined as the ratio of difference between the mean chromosomedose in the qualified samples and the dose for chromosome 13 in the onlyT13 test sample, and the standard deviation of mean of the qualifieddose.

The qualified chromosome doses also serve as the basis for determiningthreshold values when identifying aneuploidies in test samples asdescribed in the following.

TABLE 1 Qualified Chromosome Dose for Chromosomes 13, 18, 21, X and Y (n= 1; sample #11342, 46 XY) Chromosome chr 21 chr 18 chr 13 chr X chrYchr1 0.149901 0.306798 0.341832 0.490969 0.003958 chr2 0.15413 0.3154520.351475 0.504819 0.004069 chr3 0.193331 0.395685 0.44087 0.6332140.005104 chr4 0.233056 0.476988 0.531457 0.763324 0.006153 chr5 0.2192090.448649 0.499882 0.717973 0.005787 chr6 0.228548 0.467763 0.5211790.748561 0.006034 chr7 0.245124 0.501688 0.558978 0.802851 0.006472 chr80.256279 0.524519 0.584416 0.839388 0.006766 chr9 0.309871 0.6342030.706625 1.014915 0.008181 chr10 0.25122 0.514164 0.572879 0.8228170.006633 chr11 0.257168 0.526338 0.586443 0.8423 0.00679 chr12 0.2751920.563227 0.627544 0.901332 0.007265 chr13 0.438522 0.897509 1 1.4362850.011578 chr14 0.405957 0.830858 0.925738 1.329624 0.010718 chr150.406855 0.832697 0.927786 1.332566 0.010742 chr16 0.376148 0.7698490.857762 1.231991 0.009931 chr17 0.383027 0.783928 0.873448 1.2545210.010112 chr18 0.488599 1 1.114194 1.600301 0.0129 chr19 0.5358671.096742 1.221984 1.755118 0.014148 chr20 0.467308 0.956424 1.0656421.530566 0.012338 chr21 1 2.046668 2.280386 3.275285 0.026401 chr220.756263 1.547819 1.724572 2.476977 0.019966 chrX 0.305317 0.6248820.696241 1 0.008061 chrY 37.87675 77.52114 86.37362 124.0572 1

TABLE 2 Qualified Chromosome Dose, Variance and Differentiability forchromosomes 21, 18 and 13 21 (n = 35) 18 (n = 40) Avg Stdev CV T TestAvg Stdev CV T Test chr1 0.15335 0.001997 1.30 3.18E−10 0.31941 0.0083842.62 0.001675 chr2 0.15267 0.001966 1.29 9.87E−07 0.31807 0.001756 0.554.39E−05 chr3 0.18936 0.004233 2.24 1.04E−05 0.39475 0.002406 0.613.39E−05 chr4 0.21998 0.010668 4.85 0.000501 0.45873 0.014292 3.120.001349 chr5 0.21383 0.005058 2.37 1.43E−05 0.44582 0.003288 0.743.09E−05 chr6 0.22435 0.005258 2.34 1.48E−05 0.46761 0.003481 0.742.32E−05 chr7 0.24348 0.002298 0.94 2.05E−07 0.50765 0.004669 0.929.07E−05 chr8 0.25269 0.003497 1.38 1.52E−06 0.52677 0.002046 0.394.89E−05 chr9 0.31276 0.003095 0.99 3.83E−09 0.65165 0.013851 2.130.000559 chr10 0.25618 0.003112 1.21 2.28E−10 0.53354 0.013431 2.520.002137 chr11 0.26075 0.00247 0.95 1.08E−09 0.54324 0.012859 2.370.000998 chr12 0.27563 0.002316 0.84 2.04E−07 0.57445 0.006495 1.130.000125 chr13 0.41828 0.016782 4.01 0.000123 0.87245 0.020942 2.400.000164 chr14 0.40671 0.002994 0.74 7.33E−08 0.84731 0.010864 1.280.000149 chr15 0.41861 0.007686 1.84 1.85E−10 0.87164 0.027373 3.140.003862 chr16 0.39977 0.018882 4.72 7.33E−06 0.83313 0.050781 6.100.075458 chr17 0.41394 0.02313 5.59 0.000248 0.86165 0.060048 6.970.088579 chr18 0.47236 0.016627 3.52  1.3E−07 chr19 0.59435 0.05064 8.520.01494 1.23932 0.12315 9.94 0.231139 chr20 0.49464 0.021839 4.422.16E−06 1.03023 0.058995 5.73 0.061101 chr21 2.03419 0.08841 4.352.81E−05 chr22 0.84824 0.070613 8.32 0.02209 1.76258 0.169864 9.640.181808 chrX 0.27846 0.015546 5.58 0.000213 0.58691 0.026637 4.540.064883

TABLE 3 Qualified Chromosome Dose, Variance and Differentiability forchromosomes 13, X, and Y 13 (n = 47) X (n = 19) Avg Stdev CV Diff AvgStdev CV T Test chr1 0.36536 0.01775 4.86 1.904 0.56717 0.025988 4.580.001013 chr2 0.36400 0.009817 2.70 2.704 0.56753 0.014871 2.62 chr30.45168 0.007809 1.73 3.592 0.70524 0.011932 1.69 chr4 0.52541 0.0052641.00 3.083 0.82491 0.010537 1.28 chr5 0.51010 0.007922 1.55 3.9440.79690 0.012227 1.53 1.29E−11 chr6 0.53516 0.008575 1.60 3.758 0.835940.013719 1.64 2.79E−11 chr7 0.58081 0.017692 3.05 2.445 0.90507 0.0264372.92 7.41E−07 chr8 0.60261 0.015434 2.56 2.917 0.93990 0.022506 2.392.11E−08 chr9 0.74559 0.032065 4.30 2.102 1.15822 0.047092 4.07 0.000228chr10 0.61018 0.029139 4.78 2.060 0.94713 0.042866 4.53 0.000964 chr110.62133 0.028323 4.56 2.081 0.96544 0.041782 4.33 0.000419 chr12 0.657120.021853 3.33 2.380 1.02296 0.032276 3.16 3.95E−06 chr13 1.567710.014258 0.91 2.47E−15 chr14 0.96966 0.034017 3.51 2.233 1.50951 0.050093.32 8.24E−06 chr15 0.99673 0.053512 5.37 1.888 1.54618 0.077547 5.020.002925 chr16 0.95169 0.080007 8.41 1.613 1.46673 0.117073 7.980.114232 chr17 0.98547 0.091918 9.33 1.484 1.51571 0.132775 8.760.188271 chr18 1.13124 0.040032 3.54 2.312 1.74146 0.072447 4.160.001674 chr19 1.41624 0.174476 12.32 1.306 2.16586 0.252888 11.680.460752 chr20 1.17705 0.094807 8.05 1.695 1.81576 0.137494 7.57 0.08801chr21 2.33660 0.131317 5.62 1.927 3.63243 0.235392 6.48 0.00675 chr222.01678 0.243883 12.09 1.364 3.08943 0.34981 11.32 0.409449 chrX 0.666790.028788 4.32 1.114 chr2-6 0.46751 0.006762 1.45 4.066 chr3-6 0.503320.005161 1.03 5.260 chr_tot 1.13209 0.038485 3.40  2.7E−05 Y (n = 26)Avg Stdev CV T Test Chr 1- 0.00734 0.002611 30.81 1.8E−12 22, X

Examples of diagnoses of T21, T13, T18 and a case of Turner syndromeobtained using the normalizing chromosomes, chromosome doses anddifferentiability for each of the chromosomes of interest are describedin Example 3.

Example 3 Diagnosis of Fetal Aneuploidy Using Normalizing Chromosomes

To apply the use of chromosome doses for assessing aneuploidy in abiological test sample, maternal blood test samples were obtained frompregnant volunteers and cfDNA was prepared, sequenced and analyzed asdescribed in Examples 1 and 2.

Trisomy 21

Table 4 provides the calculated dose for chromosome 21 in an exemplarytest sample (#11403). The calculated threshold for the positivediagnosis of T21 aneuploidy was set at >2 standard deviations from themean of the qualified (normal) samples. A diagnosis for T21 was givenbased on the chromosome dose in the test sample being greater than theset threshold. Chromosomes 14 and 15 were used as normalizingchromosomes in separate calculations to show that either a chromosomehaving the lowest variability e.g. chromosome 14, or a chromosome havingthe greatest differentiability e.g. chromosome 15, can be used toidentify the aneuploidy. Thirteen T21 samples were identified using thecalculated chromosome doses, and the aneuploidy samples were confirmedto be T21 by karyotype.

TABLE 4 Chromosome Dose for a T21 aneuploidy (sample #11403, 47 XY +21)Sequence Chromosome Tag Dose for Chromosome Density Chr 21 ThresholdChr21 333,660 0.419672 0.412696 Chr14 795,050 Chr21 333,660 0.4410380.433978 Chr15 756,533

Trisomy 18

Table 5 provides the calculated dose for chromosome 18 in a test sample(#11390). The calculated threshold for the positive diagnosis of T18aneuploidy was set at 2 standard deviations from the mean of thequalified (normal) samples. A diagnosis for T18 was given based on thechromosome dose in the test sample being greater than the set threshold.Chromosome 8 was used as the normalizing chromosome. In this instancechromosome 8 had the lowest variability and the greatestdifferentiability. Eight T18 samples were identified using chromosomedoses, and were confirmed to be T18 by karyotype.

These data show that a normalizing chromosome can have both the lowestvariability and the greatest differentiability.

TABLE 5 Chromosome Dose for a T18 aneuploidy (sample #11390, 47 XY +18)Sequence Chromosome Tag Dose for Chromosome Density Chr 18 ThresholdChr18 602,506 0.585069 0.530867 Chr8 1,029,803

Trisomy 13

Table 6 provides the calculated dose for chromosome 13 in a test sample(#51236). The calculated threshold for the positive diagnosis of T13aneuploidy was set at 2 standard deviations from the mean of thequalified samples. A diagnosis for T13 was given based on the chromosomedose in the test sample being greater than the set threshold. Thechromosome dose for chromosome 13 was calculated using either chromosome5 or the group of chromosomes 3, 4, 5, and 6 as the normalizingchromosome. One T13 sample was identified.

TABLE 6 Chromosome Dose for a T13 aneuploidy (sample #51236, 47 XY +13)Sequence Chromosome Tag Dose for Chromosome Density Chr 13 ThresholdChr13 692,242 0.541343 0.52594 Chr5 1,278,749 Chr13 692,242 0.5304720.513647 Chr3-6 1,304,954 [average]

The sequence tag density for chromosomes 3-6 is the average tag countsfor chromosomes 3-6.

The data show that the combination of chromosomes 3, 4, 5 and 6 providea variability that is lower than that of chromosome 5, and the greatestdifferentiability than any of the other chromosomes.

Thus, a group of chromosomes can be used as the normalizing chromosometo determine chromosome doses and identify aneuploidies.

Turner Syndrome (Monosomy X)

Table 7 provides the calculated dose for chromosomes X and Y in a testsample (#51238). The calculated threshold for the positive diagnosis ofTurner Syndrome (monosomy X) was set for the X chromosome at <−2standard deviations from the mean, and for the absence of the Ychromosome at <−2 standard deviations from the mean for qualified(normal) samples.

TABLE 7 Chromosome Dose for a Turners (XO) aneuploidy (sample #51238, 45X) Sequence Chromosome Tag Dose for Chr X Chromosome Density and Chr YThreshold ChrX 873,631 0.786642 0.803832 Chr4 1,110,582 ChrY 1,3210.001542101 0.00211208 Chr_Total 856,623.6 (1-22, X) (Average)A sample having an X chromosome dose less than that of the set thresholdwas identified as having less than one X chromosome. The same sample wasdetermined to have a Y chromosome dose that was less than the setthreshold, indicating that the sample did not have a Y chromosome. Thus,the combination of chromosome doses for X and Y were used to identifythe Turner Syndrome (monosomy X) samples. Thus, the method providedenables for the determination of CNV of chromosomes. In particular, themethod enables for the determination of over- and under-representationchromosomal aneuploidies by massively parallel sequencing of maternalplasma cfDNA and identification of normalizing chromosomes for thestatistical analysis of the sequencing data. The sensitivity andreliability of the method allow for accurate first and second trimesteraneuploidy testing.

Example 4 Determination of Partial Aneuploidy

The use of sequence doses was applied for assessing partial aneuploidyin a biological test sample of cfDNA that was prepared from bloodplasma, and sequenced as described in Example 1. The sample wasconfirmed by karyotyping to have been derived from a subject with apartial deletion of chromosome 11. Analysis of the sequencing data forthe partial aneuploidy (partial deletion of chromosome 11 i.e. q21-q23)was performed as described for the chromosomal aneuploidies in theprevious examples. Mapping of the sequence tags to chromosome 11 in atest sample revealed a noticeable loss of tag counts between base pairs81000082-103000103 in the q arm of the chromosome relative to the tagcounts obtained for corresponding sequence on chromosome 11 in thequalified samples (data not shown). Sequence tags mapped to the sequenceof interest on chromosome 11 (810000082-103000103 bp) in each of thequalified samples, and sequence tags mapped to all 20 megabase segmentsin the entire genome in the qualified samples i.e. qualified sequencetag densities, were used to determine qualified sequence doses as ratiosof tag densities in all qualified samples. The average sequence dose,standard deviation, and coefficient of variation were calculated for all20 megabase segments in the entire genome, and the 20-megabase sequencehaving the least variability was the identified normalizing sequence onchromosome 5 (13000014-33000033 bp) (See Table 8), which was used tocalculate the dose for the sequence of interest in the test sample (seeTable 9). Table 8 provides the sequence dose for the sequence ofinterest on chromosome 11 (810000082-103000103 bp) in the test samplethat was calculated as the ratio of sequence tags mapped to the sequenceof interest and the sequence tags mapped to the identified normalizingsequence. FIG. 10 shows the sequence doses for the sequence of interestin the 7 qualified samples (O) and the sequence dose for thecorresponding sequence in the test sample (0). The mean is shown by thesolid line, and the calculated threshold for the positive diagnosis ofpartial aneuploidy that was set 5 standard deviations from the mean isshown by the dashed line. A diagnosis for partial aneuploidy was basedon the sequence dose in the test sample being less than the setthreshold. The test sample was verified by karyotyping to have deletionq21-q23 on chromosome 11.

Therefore, in addition to identifying chromosomal aneuploidies, themethod of the invention can be used to identify partial aneuploidies.

TABLE 8 Qualified Normalizing Sequence, Dose and Variance for SequenceChr11: 81000082-103000103 (qualified samples n = 7) Chr11:81000082-103000103 Avg Stdev CV Chr5: 13000014-33000033 1.1647020.004914 0.42

TABLE 9 Sequence Dose for Sequence of Interest (81000082-103000103) onChromosome 11 (test sample 11206) Chromosome Sequence Segment Dose Tagfor Chr 11 Chromosome Segment Density (q21-q23) Threshold Chr11:81000082-103000103 27,052 1.0434313 1.1401347 Chr5: 13000014-3300003325,926

Example 5 Demonstration of Detection of Aneuploidy

Sequencing data obtained for the samples described in Examples 2 and 3,and shown in FIGS. 2-6 were further analyzed to illustrate thesensitivity of the method in successfully identifying aneuploidies inmaternal samples. Normalized chromosome doses for chromosomes 21, 18, 13X and Y were analyzed as a distribution relative to the standarddeviation of the mean (Y-axis) and shown in FIG. 11. The normalizingchromosome used is shown as the denominator (X-axis).

FIG. 11 (A) shows the distribution of chromosome doses relative to thestandard deviation from the mean for chromosome 21 dose in theunaffected samples (o) and the trisomy 21 samples (T21; A) when usingchromosome 14 as the normalizing chromosome for chromosome 21. FIG. 11(B) shows the distribution of chromosome doses relative to the standarddeviation from the mean for chromosome 18 dose in the unaffected samples(o) and the trisomy 18 samples (T18; A) when using chromosome 8 as thenormalizing chromosome for chromosome 18. FIG. 11 (C) shows thedistribution of chromosome doses relative to the standard deviation fromthe mean for chromosome 13 dose in the unaffected samples (o) and thetrisomy 13 samples (T13; A), using the average sequence tag density ofthe group of chromosomes 3, 4, 5, and 6 as the normalizing chromosome todetermine the chromosome dose for chromosome 13. FIG. 11 (D) shows thedistribution of chromosome doses relative to the standard deviation fromthe mean for chromosome X dose in the unaffected female samples (o), theunaffected male samples (Δ), and the monosomy X samples (XO; +) whenusing chromosome 4 as the normalizing chromosome for chromosome X. FIG.11 (E) shows the distribution of chromosome doses relative to thestandard deviation from the mean for chromosome Y dose in the unaffectedmale samples (o the unaffected female sample s (Δ), and the monosomy Xsamples (+), when using the average sequence tag density of the group ofchromosomes 1-22 and X as the normalizing chromosome to determine thechromosome dose for chromosome Y.

The data show that trisomy 21, trisomy 18, trisomy 13 were clearlydistinguishable from the unaffected (normal) samples. The monosomy Xsamples were easily identifiable as having chromosome X dose that wereclearly lower than those of unaffected female samples (FIG. 11 (D)), andas having chromosome Y doses that were clearly lower than that of theunaffected male samples (FIG. 11 (E)). Therefore the method provided issensitive and specific for determining the presence or absence ofchromosomal aneuploidies in a maternal blood sample.

Example 6 Determination of Fetal Chromosomal Abnormalities UsingMassively Parallel DNA Sequencing of Cell Free Fetal DNA from MaternalBlood: Test Set 1 Independent of Training Set 1

The study was conducted by qualified site clinical research personnel at13 US clinic locations between April 2009 and July 2010 under a humansubject protocol approved by institutional review boards (IRBs) at eachinstitution. Informed written consent was obtained from each subjectprior to study participation. The protocol was designed to provide bloodsamples and clinical data to support development of noninvasive prenatalgenetic diagnostic methods. Pregnant women, age 18 years or older wereeligible for inclusion. For patients undergoing clinically indicated CVSor amniocentesis blood was collected prior to performance of theprocedure, and results of fetal karyotype was also collected. Peripheralblood samples (two tubes or ˜20 mL total) were drawn from all subjectsin acid citrate dextrose (ACD) tubes (Becton Dickinson). All sampleswere de-identified and assigned an anonymous patient ID number. Bloodsamples were shipped overnight to the laboratory in temperaturecontrolled shipping containers provided for the study. Time elapsedbetween blood draw and sample receipt was recorded as part of the sampleaccessioning.

Site research coordinators entered clinical data relevant to thepatient's current pregnancy and history into study case report forms(CRFs) using the anonymous patient ID number. Cytogenetic analysis offetal karyotype from invasive prenatal procedure samples was performedper local laboratories and the results were also recorded in study CRFs.All data obtained on CRFs were entered into a clinical database thelaboratory. Cell free plasma was obtained from individual blood tubesutilizing at two-step centrifugation process within 24-48 hours ofsample of venipuncture. Plasma from a single blood tube was sufficientfor sequencing analysis. Cell-free DNA was extracted from cell-freeplasma by using QIAamp DNA Blood Mini kit (Qiagen) according to themanufacturer's instructions. Since the cell free DNA fragments are knownto be approximately 170 base pairs (bp) in length (Fan et al., Clin Chem56:1279-1286 [2010]) no fragmentation of the DNA was required prior tosequencing.

For the training set samples, cfDNA was sent to Prognosys Biosciences,Inc. (La Jolla, Calif.) for sequencing library preparation (cfDNA bluntended and ligated to universal adapters) and sequencing using standardmanufacturer protocols with the Illumina Genome Analyzer IIxinstrumentation (available on the worldwide web at illumina.com).Single-end reads of 36 base pairs were obtained. Upon completion of thesequencing, all base call files were collected and analyzed. For thetest set samples, sequencing libraries were prepared and sequencingcarried out on Illumina Genome Analyzer IIx instrument. Sequencinglibrary preparation was performed as follows. The full-length protocoldescribed is essentially the standard protocol provided by Illumina, andonly differs from the Illumina protocol in the purification of theamplified library: the Illumina protocol instructs that the amplifiedlibrary be purified using gel electrophoresis, while the protocoldescribed herein uses magnetic beads for the same purification step.Approximately 2 ng of purified cfDNA that had been extracted frommaternal plasma was used to prepare a primary sequencing library usingNEBNext™ DNA Sample Prep DNA Reagent Set 1 (Part No. E6000L; New EnglandBiolabs, Ipswich, Mass.) for Illumina® essentially according to themanufacturer's instructions. All steps except for the final purificationof the adaptor-ligated products, which was performed using Agencourtmagnetic beads and reagents instead of the purification column, wereperformed according to the protocol accompanying the NEBNext™ Reagentsfor Sample Preparation for a genomic DNA library that is sequenced usingthe Illumina® GAII. The NEBNext™ protocol essentially follows thatprovided by Illumina, which is available atgrcf.jhml.edu/hts/protocols/11257047_ChIP_Sample Prep.pdf.

The overhangs of approximately 2 ng purified cfDNA fragments containedin 40 μl were converted into phosphorylated blunt ends according to theNEBNext® End Repair Module by incubating the 40 μl cfDNA with 5 μl 10×phosphorylation buffer, 2 μl deoxynucleotide solution mix (10 mM eachdNTP), 1 μl of a 1:5 dilution of DNA Polymerase I, 1 μl T4 DNAPolymerase and 1 μl T4 Polynucleotide Kinase provided in the NEBNext™DNA Sample Prep DNA Reagent Set 1 in a 200 μl microfuge tube in athermal cycler for 30 minutes at 20° C. The sample was cooled to 4° C.,and purified using a QIAQuick column provided in the QIAQuick PCRPurification Kit (QIAGEN Inc., Valencia, Calif.) as follows. The 50 μlreaction was transferred to 1.5 ml microfuge tube, and 250 μl of QiagenBuffer PB were added. The resulting 300 μl were transferred to aQIAquick column, which was centrifuged at 13,000 RPM for 1 minute in amicrofuge. The column was washed with 750 μl Qiagen Buffer PE, andre-centrifuged. Residual ethanol was removed by an additionalcentrifugation for 5 minutes at 13,000 RPM. The DNA was eluted in 39 μlQiagen Buffer EB by centrifugation. dA tailing of 34 μl of theblunt-ended DNA was accomplished using 16 μl of the dA-tailing mastermix containing the Klenow fragment (3′ to 5′ exo minus) (NEBNext™ DNASample Prep DNA Reagent Set 1), and incubating for 30 minutes at 37° C.according to the manufacturer's NEBNext® dA-Tailing Module. The samplewas cooled to 4° C., and purified using a column provided in theMinElute PCR Purification Kit (QIAGEN Inc., Valencia, Calif.) asfollows. The 50 μl reaction was transferred to 1.5 ml microfuge tube,and 250 μl of Qiagen Buffer PB were added. The 300 μl were transferredto the MinElute column, which was centrifuged at 13,000 RPM for 1 minutein a microfuge. The column was washed with 750 μl Qiagen Buffer PE, andre-centrifuged. Residual ethanol was removed by an additionalcentrifugation for 5 minutes at 13,000 RPM. The DNA was eluted in 15 μlQiagen Buffer EB by centrifugation. Ten microliters of the DNA eluatewere incubated with 1 μl of a 1:5 dilution of the Illumina GenomicAdapter Oligo Mix (Part No. 1000521), 15 μl of 2× Quick LigationReaction Buffer, and 4 μl Quick T4 DNA Ligase, for 15 minutes at 25° C.according to the NEBNext® Quick Ligation Module. The sample was cooledto 4° C., and purified using a MinElute column as follows. One hundredand fifty microliters of Qiagen Buffer PE were added to the 30 μlreaction, and the entire volume was transferred to a MinElute columnwere transferred to a MinElute column, which was centrifuged at 13,000RPM for 1 minute in a microfuge. The column was washed with 750 μlQiagen Buffer PE, and re-centrifuged. Residual ethanol was removed by anadditional centrifugation for 5 minutes at 13,000 RPM. The DNA waseluted in 28 μl Qiagen Buffer EB by centrifugation. Twenty threemicroliters of the adaptor-ligated DNA eluate were subjected to 18cycles of PCR (98° C. for 30 seconds; 18 cycles of 98° C. for 10seconds, 65° C. for 30 seconds, and 72° C. for 30; final extension at72° C. for 5 minutes, and hold at 4° C.) using Illumina Genomic PCRPrimers (Part Nos. 100537 and 1000538) and the Phusion HF PCR Master Mixprovided in the NEBNext™ DNA Sample Prep DNA Reagent Set 1, according tothe manufacturer's instructions. The amplified product was purifiedusing the Agencourt AMPure XP PCR purification system (AgencourtBioscience Corporation, Beverly, Mass.) according to the manufacturer'sinstructions available atwww.beckmangenomics.com/products/AMPureXPProtocol_000387v001.pdf. TheAgencourt AMPure XP PCR purification system removes unincorporateddNTPs, primers, primer dimers, salts and other contaminates, andrecovers amplicons greater than 100 bp. The purified amplified productwas eluted from the Agencourt beads in 40 μl of Qiagen EB Buffer and thesize distribution of the libraries was analyzed using the Agilent DNA1000 Kit for the 2100 Bioanalyzer (Agilent technologies Inc., SantaClara, Calif.). For both the training and test sample sets, single-endreads of 36 base pairs were sequenced.

Data Analysis and Sample Classification

Sequence reads 36 bases in length were aligned to the human genomeassembly hg18 obtained from the UCSC database(hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips). Alignments werecarried out utilizing the Bowtie short read aligner (version 0.12.5)allowing for up to two base mismatches during alignment (Langmead etal., Genome Biol 10.R25 [2009]. Only reads that unambiguously mapped toa single genomic location were included. Genomic sites where readsmapped were counted and included in the calculation of chromosome doses(see below). Regions on the Y chromosome where sequence tags from maleand female fetuses map without any discrimination were excluded from theanalysis (specifically, from base 0 to base 2×10⁶; base 10×10⁶ to base13×10⁶; and base 23×10⁶ to the end of chromosome Y).

Intra-run and inter-run sequencing variation in the chromosomaldistribution of sequence reads can obscure the effects of fetalaneuploidy on the distribution of mapped sequence sites. To correct forsuch variation, a chromosome dose was calculated as the count of mappedsites for a given chromosome of interest is normalized to countsobserved on a predetermined normalizing chromosome sequence. Asdescribed previously, a normalized chromosome sequence can be composedof a single chromosome or a group of chromosomes. The normalizingchromosome sequence was first identified in a subset of samples in thetraining set of samples that were unaffected i.e. qualified sampleshaving diploid karyotypes for chromosomes of interest 21, 18, 13 and X,considering each autosome as a potential denominator in a ratio ofcounts with our chromosomes of interest. Denominator chromosomes i.e.normalizing chromosome sequences were selected that minimized thevariation of the chromosome doses within and between sequencing runs.Each chromosome of interest was determined to have a distinctnormalizing chromosome sequence (denominator) (Table 10). No singlechromosome could be identified as a normalizing chromosome sequence forchromosome 13 as no one chromosome was determined to reduce thevariability in the dose of chromosome 13 across samples i.e. the spreadof the NCV values for chromosome 13 was not reduced sufficiently toallow for a correct identification of a T13 aneuploidy. Chromosomes 2-6were chosen randomly and tested for their ability as a group to mimicthe behavior of chromosome 13. The group of chromosomes 2-6 was found todiminish substantially the variability in the dose for chromosome 13 inthe training samples, and was thus chosen as the normalizing chromosomesequence for chromosome 13. As described above, the variability inchromosome dose for chromosome Y is greater than 30 independently ofwhich single chromosome is used as the normalizing chromosome sequencein determining the chromosome Y dose. The group of chromosomes 2-6 wasfound to diminish substantially the variability in the dose forchromosome Y in the training samples, and was thus chosen as thenormalizing chromosome sequence for chromosome Y.

The chromosome doses for each of the chromosomes of interest in thequalified samples provides a measure of the variation in the totalnumber of mapped sequence tags for each chromosome of interest relativeto that of each of the remaining chromosomes. Thus, qualified chromosomedoses can identify the chromosome or a group of chromosomes i.e.normalizing chromosome sequence that has a variation among samples thatis closest to the variation of the chromosome of interest, and thatwould serve as ideal sequences for normalizing values for furtherstatistical evaluation.

Chromosome doses for all samples in the training set i.e. qualified andaffected, also serve as the basis for determining threshold values whenidentifying aneuploidies in test samples as described in the following.

TABLE 10 Normalizing Chromosome Sequences for Determining ChromosomeDoses Chromosome of Normalizing Chromosome Chromosome Interest-NumeratorSequence-Denominator of Interest (Chr mapped counts) (Chr mapped counts)21 Chr 21 Chr 9 18 Chr 18 Chr 8 13 Chr 13 Sum(Chr 2-6) X Chr X Chr 6 YChr Y Sum(Chr 2-6)For each chromosome of interest in each sample in the test set, anormalizing value was determined and used to determine the presence orabsence of an aneuploidy. The normalizing value was calculated as achromosome dose that can be further computed to provide a normalizedchromosome value (NCV).

Chromosome Doses

For the test set, a chromosome dose was calculated for each chromosomeof interest, 21, 18, 13, X and Y for every sample. As provided in Table10 above, the chromosome dose for chromosome 21 was calculated as aratio of the number of tags in the test sample that mapped to chromosome21 in the test sample, and the number of tags in the test sample thatmapped to chromosome 9; the chromosome dose for chromosome 18 wascalculated as a ratio of the number of tags in the test sample thatmapped to chromosome 18 in the test sample, and the number of tags inthe test sample that mapped to chromosome 8; the chromosome dose forchromosome 13 was calculated as a ratio of the number of tags in thetest sample that mapped to chromosome 13 in the test sample, and thenumber of tags in the test sample that mapped to chromosomes 2-6; thechromosome dose for chromosome X was calculated as a ratio of the numberof tags in the test sample that mapped to chromosome X in the testsample, and the number of tags in the test sample that mapped tochromosome 6; and the chromosome dose for chromosome Y was calculated asa ratio of the number of tags in the test sample that mapped tochromosome Y in the test sample, and the number of tags in the testsample that mapped to chromosomes 2-6.

Normalized Chromosome Values

Using the chromosome dose for each of the chromosomes of interest ineach of the test samples, and the mean of the corresponding chromosomedose determined in the qualified samples of the training set, anormalized chromosome value (NCV) was calculated using the equation:

${NCV_{ij}} = \frac{x_{ij} - {\hat{\mu}}_{j}}{{\hat{\sigma}}_{j}}$

where {circumflex over (μ)}_(j) AND {circumflex over (σ)} _(j) are theestimated training set mean and standard deviation respectively for thej-th chromosome dose, and x_(ij) is the observed j-th chromosome dosefor sample i. When chromosome doses are normally distributed, the NCV isequivalent to a statistical z-score for the doses. No significantdeparture from linearity is observed in a quantile-quantile plot of theNCVs from unaffected samples. In addition, standard tests of normalityfor the NCVs fail to reject the null hypothesis of normality.

For the test set, an NCV was calculated for each chromosome of interest,21, 18, 13, X and Y for every sample. To insure a safe and effectiveclassification scheme, conservative boundaries were chosen foraneuploidy classification. For classification of the autosomes'aneuploidy state, a NCV>4.0 was required to classify the chromosome asaffected (i.e. aneuploid for that chromosome) and a NCV<2.5 to classifya chromosome as unaffected. Samples with autosomes that have an NCVbetween 2.5 and 4.0 were classified as “no call”.

Sex chromosome classification in the test was performed by sequentialapplication of NCVs for both X and Y as follows:

-   -   1. If NCV Y>−2.0 standard deviations from the mean of male        samples, then the sample was classified as male (XY).    -   2. If NCV Y<−2.0 standard deviations from the mean of male        samples, and NCV X>−2.0 standard deviations from the mean of        female samples, then the sample was classified as female (XX).    -   3. If NCV Y<−2.0 standard deviations from the mean of male        samples, and NCV X<−3.0 standard deviations from the mean of        female samples, then the sample was classified as monosomy X,        i.e. Turner syndrome.    -   4. If the NCVs did not fit into any of the above criteria, then        the sample was classified as a “no call” for sex.

Results Study Population Demographics

A total of 1,014 patients were enrolled between April 2009 and July2010. The patient demographics, invasive procedure type and karyotyperesults are summarized in Table 11. The average age of studyparticipants was 35.6 yrs (range 17 to 47 yrs) and gestational ageranged between 6 weeks, 1 day to 38 weeks, 1 day (mean 15 weeks, 4days). The overall incidence of abnormal fetal chromosome karyotypes was6.8% with T21 incidence of 2.5%. Of 946 subjects with singletonpregnancies and karyotype, 906 (96%) showed at least one clinicallyrecognized risk factor for fetal aneuploidy prior to prenatal procedure.Even eliminating those with advanced maternal age as their soleindication, the data demonstrates a very high false positive rate forcurrent screening modalities. Ultrasound findings of increased nuchaltranslucency, cystic hygroma, or other structural congenital abnormalityby ultrasound were most predictive of abnormal karyotype in this cohort.

TABLE 11 Patient Demographics Total Enrolled Training Set Test Set (N =1014) (N = 71) (N = 48) Dates of Enrollment April 2009-July2010 April2009- January 2010- December 2009 June 2010 Number enrolled 1014 435 575Maternal Age, yrs Mean (SD) 35.6 (5.66) 36.4 (6.05) 34.2 (8.22) Min/Max17/47 20/46 18/46 Not Specified, N  11  3  0 Ethnicity, N (%) Caucasian 636 (62.7)   50 (70.4)   24 (50.0) Hispanic  167 (16.5)   6 (8.5)   13(27.0) Asian   63 (6.2)   6 (8.5)   5 (10.4) Multi, more than one   53(5.2)   6 (8.5)   1 (2.1) African American   41 (4.0)   1 (1.3)   3(6.3) Other   36 (3.6)   2 (2.8)   1 (2.1) Native American   9 (0.9)   0(0.0)   1 (2.1) Not Specified   9 (0.9)   0 (0.0)   0 (0.0) GestationalAge, wks, days Mean 15 w 4 d 14 w 5 d 15 w 3 d Min/Max  6 w 1 d/38 w 1 d10 w 0 d/23 w 1 d 10 w 4 d/28 w 3 d Number of Fetus, N 1  982  67  47 2 30  4  1 3   2  0  0 Prenatal Procedure, N (%) CVS  430 (42.4)   38(53.5)   28 (58.3) Amniocentesis  571 (56.3)   32 (45.1)   20 (41.7) Notspecified   3 (0.3)   1 (1.4)   0 (0.0) Not performed   10 (1.0)   0(0.0)   0 (0.0) Fetal Karyotype, N (%) 46 XX 453* (43.9)  22* (29.7)  7*(14.6) 46 XY 474* (45.9)  26* (35.1)   14 (29.2) 47, +21, both sexes 25* (2.4)  10* (13.5)   13 (27.1) 47, +18, both sexes   14 (1.4)   5(6.8)   8 (16.7) 47, +13, both sexes    4 (0.4)   2 (2.7)   1 (2.1) 45,X   8 (0.8)   3 (4.1)   3 (6.3) Complex, other  18* (1.7)   6 (8.1)   2(4.2) Karyotype not available   36 (3.5)   0 (0.0)   0 (0.0) PrenatalScreening Risks Non-sequenced Analyzed Analyzed Test for Karyotyped N =834 Training N = 47 Singletons, N (%)  445 (53.4) N = 65   21 (44.7) AMAonly (≥35 years)  149 (17.9)   27 (41.5)    9 (19.1) Screen positive(trisomy)**   35 (4.2)   18 (27.7)    5 (10.6) Increased NT   12 (1.4)  3 (4.6)    4 (8.5) Cystic Hygroma   14 (1.7)   5 (7.7)    4 (8.5)Cardiac Defect   78 (9.4)   0 (0.0)    3 (6.4) Other Congenital   64(7.7)   4 (6.2)    1 (2.1) Abnormality   37 (4.4)   5 (7.7)    0 (0.0)Other Maternal Risk   3 (4.6) None specified *Includes results offetuses from multiple gestations, **Assessed and reported by cliniciansAbbreviations: AMA = Advanced Maternal Age, NT = nuchal translucency

The distribution of diverse ethnic backgrounds represented in this studypopulation is also shown in Table 11. Overall, 63% of the patients inthis study were Caucasian, 17% Hispanic, 6% Asian, 5% multi-ethnic, and4% African American. It was noted that the ethnic diversity variedsignificantly from site to site. For example, one site enrolled 60%Hispanic and 26% Caucasian subjects while three clinics all located inthe same state, enrolled no Hispanic subjects. As expected, there wereno discernible differences observed in our results for differentethnicities.

Training Data Set 1

The training set study selected 71 samples from the initial sequentialaccumulation of 435 samples that were collected between April 2009 andDecember 2009. All subjects with affected fetus' (abnormal karyotypes)in this first series of subjects were included for sequencing and arandom selection and number of non-affected subjects with adequatesample and data. Clinical characteristics of the training set patientswere consistent with the overall study demographics as shown in Table11. The gestational age range of the samples in the training set rangedfrom 10 weeks, 0 days to 23 weeks 1 day. Thirty-eight underwent CVS, 32underwent amniocentesis and 1 patient did not have the invasiveprocedure type specified (an unaffected karyotype 46, XY). 70% of thepatients were Caucasian, 8.5% Hispanic, 8.5% Asian, and 8.5%multi-ethnic. Six sequenced samples were removed from this set for thepurposes of training: 4 samples from subjects with twin gestations(further discussed below), 1 sample with T18 that was contaminatedduring preparation, and 1 sample with a fetal karyotype 69, XXX, leaving65 samples for the training set.

The number of unique sequence sites (i.e. tags identified with uniquesites in the genome) varied from 2.2M in the early phases of thetraining set study to 13.7M in the latter phases due to improvements insequencing technology over time. In order to monitor for any potentialshifts in the chromosome doses over this 6-fold range in unique sites,different unaffected samples were run at the beginning and end of thestudy. For the first 15 unaffected samples run, the average number ofunique sites was 3.8M and the average chromosome doses for chromosome 21and chromosome 18 were 0.314 and 0.528, respectively. For the last 15unaffected samples run, the average number of unique sites was 10.7M andthe average chromosome doses for chromosome 21 and chromosome 18 were0.316 and 0.529, respectively. There was no statistical differencebetween the chromosome doses for chromosome 21 and chromosome 18 overthe time of the training set study.

The training set NCVs for chromosomes 21, 18 and 13 are shown on FIG.12. The results shown in FIG. 12 are consistent with an assumption ofnormality in that roughly 99% of the diploid NCVs would fall within +2.5standard deviations of the mean. Of this set of 65 samples, 8 sampleswith clinical karyotypes indicating T21 had NCVs ranging from 6 to 20.Four samples having clinical karyotypes indicative of fetal T18 had NCVsranging from 3.3 to 12, and the two samples having karyotypes indicativeof fetal trisomy 13 (T13) had NCVs of 2.6 and 4. The spread of the NCVsin affected samples is due to their dependence on the percentage offetal cfDNA in the individual samples.

Similar to the autosomes, the means and standard deviations for the sexchromosomes were established in the training set. The sex chromosomethresholds allowed 100% identification of male and female fetuses in thetraining set.

Test Data Set 1

Having established chromosome doses means and standard deviations fromthe training set, a test set of 48 samples was selected from samplescollected between January 2010 and June 2010 from 575 total samples. Oneof the samples from a twin gestation was removed from the final analysisleaving 47 samples in the test set. Personnel preparing samples forsequencing and operating the equipment were blinded to the clinicalkaryotype information. The gestational age range was similar to thatseen in the training set (Table 11). 58% of the invasive procedures wereCVS, higher than that of the overall procedural demographics, but alsosimilar to the training set. 50% of subjects were Caucasian, 27%Hispanic, 10.4% Asian and 6.3% African American.

In the test set, the number of unique sequence tags varied fromapproximately 13M to 26M. For unaffected samples, the chromosome dosesfor chromosome 21 and chromosome 18 were 0.313 and 0.527, respectively.The test set NCVs for chromosome 21, chromosome 18 and chromosome 13 areshown in FIG. 13 and the classifications are given in Table 12.

TABLE 12 Test Set Classification Data Test Set Classification Data T21classification Unaffected Karyotype for T21 T21 No Call Unaffected forT21 34 47, XX or XY +21 13 T18 classification Unaffected Karyotype forT18 T18 No Call Unaffected for T18 39 47, XX or XY +18 8 T13classification Unaffected Karyotype for T13 T13 No Call Unaffected forT13 46 47, XX or XY +13 1 Sex Chromosome Classification Karyotype XY XXMX* No Call 46, XY 24 46, XX 18 1 45, X 2 1 Cplx  1 *MX is monosomy inthe X chromosome with no evidence of Y chromosome

In the test set, 13/13 subjects having clinical karyotypes thatindicated fetal T21 were correctly identified having NCVs ranging from 5to 14. Eight/eight subjects having karyotypes that indicated fetal T18were correctly identified having NCVs ranging from 8.5 to 22. The singlesample having a karyotype classified as T13 in this test set wasclassified as a no call with an NCV of approximately 3.

For the test data set, all male samples were correctly identifiedincluding a sample with complex karyotype, 46,XY+marker chromosome(unidentifiable by cytogenetics) (Table 3). Nineteen of twenty femalesamples were correctly identified, and one female sample was categorizedas a no call. For three samples in the test set with karyotype of 45,X,two of the three were correctly identified as monosomy X and 1 wasclassified as a no call (Table 12).

Twins

Four of the samples initially selected for the training set and one ofthe samples in the test set were from twin gestations. The thresholdsbeing employed here could be confounded by the differing amount of cfDNAexpected in the setting of a twin gestation. In the training set, thekaryotype from one of the twin samples was monochorionic 47,XY+21. Asecond twin sample was fraternal and amniocentesis was carried out oneach of the fetuses individually. In this twin gestation, one of thefetuses had a karyotype of 47,XY+21 while the other had a normalkaryotype, 46,XX. In both of these cases the cell free classificationbased on the methods discussed above classified the sample as T21. Theother two twin gestations in the training set were classified correctlyas non-affected for T21 (all twins showed diploid karyotype forchromosome 21). For the twin gestation sample in the test set, karyotypewas only established for Twin B (46,XX) and the algorithm correctlyclassified as non-affected for T21.

Conclusion

The data show that massively parallel sequencing can be used todetermine a plurality abnormal fetal karyotypes from the blood ofpregnant women. These data demonstrate that 100% correct classificationof samples with trisomy 21 and trisomy 18 can be identified usingindependent test set data. Even in the case of fetuses with abnormal sexchromosome karyotypes, none of the samples were incorrectly classifiedwith the algorithm of the method. Importantly, the algorithm alsoperformed well in determining the presence of T21 in two sets of twinpregnancies having at least one affected fetus, which has never beenshown previously. Furthermore, this study examined a variety ofsequential samples from multiple centers representing not only the rangeof abnormal karyotypes that one is likely to witness in a commercialclinical setting, but showing the significance of accurately classifyingpregnancies non-affected by common trisomies to address the unacceptablyhigh false positive rates that remain in prenatal screening today. Thedata provide valuable insight into the vast capabilities of employingthis method in the future. Analysis of subsets of the unique genomicsites showed increases in the variance consistent Poisson countingstatistics.

The data build on the findings of Fan and Quake who demonstrated thatthe sensitivity of noninvasive prenatal determination of fetalaneuploidy from maternal plasma using massively parallel sequencing isonly limited by the counting statistics (Fan and Quake, PLos One 5,e10439 [2010]). Because sequencing information was collected across theentire genome, this method is capable of determining any aneuploidy orother copy number variation including insertions and deletions. Thekaryotype from one of the samples had a small deletion in chromosome 11between q21 and q23 that was observed as a ˜10% decrease in the relativenumber of tags in a 25 Mb region starting at q21 when the sequencingdata was analyzed in 500 kbase bins. In addition, in the training set,three of the samples had complex sex karyotypes due to mosaicism in thecytogenetic analysis. These karyotypes were: i) 47,XXX[9]/45,X[6], ii)45,X [3]/46, XY[17], and iii) 47,XXX[13]/45,X[7]. Sample ii, whichshowed some XY-containing cells was correctly classified as XY. Samplesi (from CVS procedure) and iii (from amniocentesis), which both showed amixture of XXX and X cells by cytogenetic analysis (consistent withmosaic Turner syndrome), were classified as a no call and monosomy X,respectively.

In testing the algorithm, another interesting data point was observedhaving an NCV between −5 and −6 for chromosome 21 for one sample fromthe test set (FIG. 13). Although this sample was diploid in chromosome21 by cytogenetics, the karyotype showed mosaicism with partialtriploidy for chromosome 9; 47, XX+9 [9]/46, XX [6]. Since chromosome 9is used in the denominator to determine the chromosome dose forchromosome 21 (Table 10), this lowers the overall NCV value. The abilityof the use of normalizing chromosomes to determine fetal trisomy 9 inthis sample is evidenced by the results provided in Example 7 below.

The conclusion of Fan, et al regarding the sensitivity of these methodsis only correct if the algorithms being utilized are able to account forany random or systematic biases introduced by the sequencing method. Ifthe sequencing data is not properly normalized the resulting analysiswill be inferior to the counting statistics. Chiu, et al noted in theirrecent paper that their measurement of chromosomes 18 and 13 using themassively parallel sequencing method was imprecise, and concluded thatmore research was necessary to apply the method to the determination ofT18 and T13 (Chiu et al., BMJ 342:c7401 [2011]). The method utilized inthe Chiu, et al paper simply uses the number of sequence tags on thechromosome of interest, in their case chromosome 21, normalized by thetotal number of tags in the sequencing run. The challenge for thisapproach is that the distribution of tags on each chromosome can varyfrom sequencing run to sequencing run, and thus increases the overallvariation of the aneuploidy determination metric. In order to comparethe results of the Chiu algorithm to the chromosome doses used in thisexample, the test data for chromosomes 21 and 18 was reanalyzed usingthe method recommended by Chiu, et al. as shown in FIG. 14. Overall, acompression in the range of NCV for each of the chromosomes 21 and 18was observed as well as a decrease in the determination rate with 10/13T21 and ⅝ of the T18 samples correctly identified from our test setutilizing an NCV threshold of 4.0 for aneuploidy classification.

Ehrich, et al also focused only on T21 and used the same algorithm asChiu, et al., (Ehrich et al., Am J Obstet Gynecol 204:205 e1-e11[2011]). In addition, after observing a shift in their test set z-scoremetric from the external reference data i.e. training set, theyretrained on the test set to establish the classification boundaries.Although in principle this approach is feasible, in practice it would bechallenging to decide how many samples are required to train and howoften one would need to retrain to ensure that the classificationboundaries are correct. One method of mitigating this issue is toinclude controls in every sequencing run that measure the baseline andcalibrate for quantitative behavior.

The data obtained using the present method show that massively parallelsequencing is capable of determining multiple fetal chromosomalabnormalities from the plasma of pregnant women when the algorithm fornormalizing the chromosome counting data is optimized. The presentmethod for quantification not only minimizes random and systematicvariations between sequencing runs, but also allows for effectiveclassification of aneuploidies across the entire genome, most notablyT21 and T18. Larger sample collections are required to test thealgorithm for T13 determination. To this end, a prospective, blinded,multi-site clinical study to further demonstrate the diagnostic accuracyof the present method is being performed.

Example 7 Determination of the Presence or Absence of at Least 5Different Chromosomal Aneuploidies in all Chromosomes of Individual TestSamples

To demonstrate the capability of the method to determine the presence orabsence of any chromosomal aneuploidy in each of a set of maternal testsamples (test set 1; Example 6), systematically determined normalizingchromosome sequences were identified in unaffected samples of thetraining set (training set 1; Example 6), and used to calculatechromosome doses for all chromosomes in each of the test samples.Determination of the presence or absence of any one or more differentcomplete fetal chromosomal aneuploidies in each of the test and trainingset samples was accomplished from sequencing information obtained from asingle sequencing run on each individual sample.

Using the chromosome densities i.e. the number of sequence tagsidentified for each chromosome in each of the samples of the trainingset described in Example 6, a systematically determined normalizingchromosome sequence consisting of a single chromosome or a group ofchromosomes was determined by calculating a single chromosome dose foreach of chromosomes 1-22, X and Y. The systematically determinednormalizing chromosome sequence for each of chromosomes 1-22, X, and Ywas determined by systematically calculating chromosome doses for eachchromosome using every possible combination of chromosomes as thenumerator. For example, for chromosome 21 as the chromosome of interest,chromosome doses were calculated as a ratio of (i) the number ofsequence tags obtained for chromosome 21 (chromosome of interest) and(ii) the number of sequence tags obtained for each of the remainingchromosomes, and the sum of the number of tags obtained for all possiblecombinations of the remaining chromosomes (excluding chromosome 21) i.e.1, 2, 3, 4, 5, etc. up to 20, 21, 22, X, and Y; 1+2, 1+3, 1+4, 1+5, etc.up to 1+20, 1+22, 1+X, and 1+Y; 1+2+3, 1+2+4, 1+2+5 etc. up to 1+2+20,1+2+22, 1+2+X, and 1+2+Y; 1+3+4, 1+3+5, 1+3+6 etc. up to 1+3+20, 1+3+22,1+3+X, and 1+3+Y; 1+2+3+4, 1+2+3+5, 1+2+3+6 etc. up to 1+2+3+20,1+2+3+22, 1+2+3+X, and 1+2+3+Y; and so on such that all possiblecombinations of all of chromosomes 1-20, 22, X and Y were used as anormalizing chromosome sequence (numerator) to determine all possiblechromosome doses for each chromosome of interest in each of thequalified (aneuploid) samples in the training set. Chromosome doses weredetermined in the same manner for chromosome 21 in all training samples,and the systematically determined normalizing chromosome sequence forchromosome 21 was determined as the single or group of chromosomesresulting in a dose for chromosome 21 having the smallest variabilityacross all training samples. The same analysis was repeated to determinethe single chromosome or combination of chromosomes that would serve asthe systematically determined normalizing chromosome sequence for eachof the remaining chromosomes including chromosomes 13, 18, X and Y i.e.all possible combinations of chromosomes were used to determine thenormalizing sequence (single chromosome or a group of chromosomes) forall other chromosomes of interest 1-12, 14-17, 19-20, 22, X and Y, inall training samples. Thus, all chromosomes were treated as chromosomesof interest, and a systematically determined normalizing sequence wasdetermined for each of all chromosomes in each of the unaffected samplesin the training set. Table 13 provides the single or the group ofchromosomes that were identified as the systematically determinednormalizing sequence for each of chromosomes of interest 1-22, X, and Y.As highlighted by Table 13, for some chromosomes of interest, thesystematically determined normalizing chromosome sequence was determinedto be a single chromosome (e.g. when chromosome 4 is the chromosome ofinterest), and for other chromosomes of interest, the systematicallydetermined normalizing chromosome sequence was determined to be a groupof chromosomes (e.g. when chromosome 21 is the chromosome of interest).

TABLE 13 Systematically Determined Normalizing Chromosome Sequences forAll Chromosomes Systematically Chromosome Systematically DeterminedChromosome determined of Interest Normalizing Sequence of InterestNormalizing Sequence 1 6 + 10 + 14 + 15 + 17 + 20 13 4 + 5 2 3 + 6 + 8 +9 + 10 14 1 + 3 + 5 + 6 + 10 + 19 3 2 + 4 + 5 + 6 + 12 15 1 + 14 + 20 45 16 14 + 17 + 19 + 20 + 22 5 4 + 6 + 8 + 14 17 15 + 19 + 22 6 3 + 4 +5 + 12 + 14 18 2 + 3 + 5 + 7 7 4 + 5 + 8 + 14 + 19 + 20 19 22 8 2 + 5 +7 20 10 + 16 + 17 + 22 9 3 + 4 + 8 + 10 + 17 + 19 + 20 + 22 21 4 + 14 +16 + 20 + 22 10 2 + 14 + 15 + 17 + 20 22 19 11 5 + 10 + 14 + 20 + 22 X4 + 8 12 1 + 2 + 3 + 5 + 6 + 19 Y 4 + 6

The mean, standard deviation (SD) and coefficient of variance (CV) forthe systematically determined normalizing chromosome sequence determinedfor each of all chromosomes are given in Table 14.

TABLE 14 Mean, Standard Deviation and Coefficient of Variance for allsystematically determined normalizing chromosome sequences Chromosome ofinterest Mean SD CV  1 0.36637 0.00266  0.72%  2 0.31580 0.00068  0.22% 3 0.21983 0.00055  0.18%  4 0.98191 0.02509  2.56%  5 0.30109 0.00076 0.25%  6 0.21621 0.00059  0.27%  7 0.21214 0.00044  0.21%  8 0.255620.00068  0.27%  9 0.12726 0.00034  0.27% 10 0.24471 0.00098  0.40% 110.26907 0.00098  0.36% 12 0.12358 0.00029  0.23% 13^(a) 0.26023 0.00122 0.47% 14 0.09286 0.00028  0.30% 15 0.21568 0.00147  0.68% 16 0.251810.00134  0.53% 17 0.46000 0.00248  0.54% 18^(a) 0.10100 0.00038  0.38%19 1.43709 0.02899  2.02% 20 0.19967 0.00123  0.62% 21^(a) 0.078510.00053  0.67% 22 0.69613 0.01391  2.00% X^(b) 0.46865 0.00279  0.68%Y^(b) 0.00028 0.00004 14.97% ^(a)Excluding trisomies ^(b)Female fetus

The variance in chromosome doses across all training samples asreflected by the value of the CV, substantiates the use ofsystematically determined normalizing chromosome sequences to provide alarge signal-to-noise ratio and dynamic range, allowing for thedetermination of the aneuploidies to be made with high sensitivity andhigh specificity, as shown in the following.

To demonstrate the sensitivity and specificity of the method, chromosomedoses for all chromosomes of interest 1-22, X and Y were determined ineach of the samples in the training set, and in each of all samples inthe test set described in Example 5 using the correspondingsystematically determined normalizing chromosome sequences provided inTable 13 above.

Using the systematically determined normalizing chromosome sequence foreach of the chromosomes of interest, the presence or absence of anychromosomal aneuploidy was determined in each of the samples in thetraining set, and in each of the test samples i.e. it was determinedwhether each sample contained a complete fetal chromosomal aneuploidy ofchromosome 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, X, and Y. Sequence information i.e. the number ofsequence tags, was obtained for all chromosomes in each of the samplesin the training set, and in each of the test samples, and a singlechromosome dose for each of the chromosomes in each of the training andtest samples was calculated as described above using the number ofsequence tags obtained for the systematically determined normalizingchromosome sequences corresponding to those determined in the trainedset (Table 13). The number of sequence tags obtained in each of thetraining samples for the systematically determined normalizingchromosome sequences was used to determine the chromosome doses for eachchromosome in each of the training samples, and the number of sequencetags obtained in each of the test samples for the systematicallydetermined normalizing chromosome sequence was used to determine thechromosome dose for each chromosome for each of the test samples. Toensure safe and effective classification of aneuploidies, the sameconservative boundaries were chosen as described in Example 6.

Training Set Results

A plot of the chromosome doses for chromosomes 21, 18 and 13 in thetraining set of samples using the systematically determined normalizingchromosome sequence is given in FIG. 15. When using the systematicallydetermined normalizing chromosome sequence i.e. the group of chromosomes4+14+16+20+22, 8 samples with clinical karyotypes indicating T21 hadNCVs between 5.4 and 21.5. When using the systematically determinednormalizing chromosome sequence i.e. the group of chromosomes 2+3+5+7, 4samples with clinical karyotypes indicating T18 had NCVs between 3.3 and15.3. When using the systematically determined normalizing chromosomesequence i.e. the group of chromosomes 4+5, 2 samples with clinicalkaryotypes indicating T13 had NCVs of 8.0 and 12.4. The T21 samples ofthe training set are shown as the last 8 samples of the chromosome 21data (O); the T18 samples of the training set are shown as the last 4samples of the chromosome 18 data (Δ); and the T13 samples of thetraining set are shown as the last 2 samples of the chromosome 13 data(□).

These data show that normalizing chromosome sequences can be used todetermine and correctly classify different complete fetal chromosomalaneuploidies with great confidence. Since all samples with affectedkaryotypes had NCVs greater than 3, there is less than approximately0.1% probability that these samples are part of the unaffecteddistribution.

Similarly to the autosomes, when the systematically determinednormalizing chromosome sequence (i.e. the group of chromosomes 4+8) wasused for chromosome X, and when the systematically determinednormalizing chromosome sequence (i.e. the group of chromosomes 4+6) wasused for chromosome Y, all of the male and female fetuses in thetraining set were correctly identified. In addition, all 5 of themonosomy X samples were identified. FIG. 18A shows a plot of NCVsdetermined for the X chromosome (X-axis) and NCVs determined for the Ychromosome (Y axis) for each of the samples in the training set. All ofthe samples which are monosomy e X by karyotype have NCV values of lessthan −4.83. Those monosomy X samples that have karyotypes consistentwith a 45,X karyotype (full or mosaic) have a Y NCV value close to zeroas expected. Female samples cluster around NCV=0 for both X and Y.

Test Set Results

A plot of the chromosome doses for chromosomes 21, 18 and 13 in the testsamples using the relevant systematically determined normalizingchromosome sequences is given in FIG. 16. When using the systematicallydetermined normalizing chromosome sequence (i.e. the group ofchromosomes 4+14+16+20+22), then 13 of 13 samples with clinicalkaryotypes indicating T21 were correctly identified with NCVs between7.2 and 16.3. When using the systematically determined normalizingchromosome sequence (i.e. the group of chromosomes 2+3+5+7), then all 8samples with clinical karyotypes indicating T18 were identified withNCVs between 12.7 and 30.7. When using the systematically determinednormalizing chromosome sequence (i.e. the group of chromosomes 4+5),then the only one sample with clinical karyotypes indicating T13 wascorrectly identified with an NCV of 8.6. The T21 samples of the test setare shown as the last 13 samples of the chromosome 21 data (0); theT18samples of the test set are shown as the last 8 samples of thechromosome 18 data (Δ); and the T13 sample of the test set is shown asthe last sample of the chromosome 13 data (□).

These data show that systematically determined normalizing chromosomesequences can be used to determine and correctly classify differentcomplete fetal chromosomal aneuploidies with great confidence. Similarto the training set, all samples with affected karyotypes had NCVsgreater than 7, which indicated an infinitesimally small probabilitythat these samples are part of the unaffected distribution. (FIG. 16).

Similarly to the autosomes, when the systematically determinednormalizing chromosome sequence (i.e. the group of chromosomes 4+8) wasused for chromosome X, and when the systematically determinednormalizing chromosome sequence (i.e. the group of chromosomes 4+6) wasused for chromosome Y, all of the male and female fetuses in the testset were correctly identified. In addition, all 3 of the monosomy Xsamples were determined. FIG. 18B shows a plot of NCVs determined forthe X chromosome (X-axis) and NCVs determined for the Y chromosome (Yaxis) for each of the samples in the test set As previously described,the present method allows for determining the presence or absence of acomplete, or partial, chromosomal aneuploidy of each of chromosomes1-22, X, and Y in each sample. In addition to determining completechromosomal aneuploidies T13, T18, T21, and monosomy X, the methoddetermined the presence of a trisomy of chromosome 9 in one of the testsamples. When using the systematically determined normalizing chromosomesequence (i.e. the group of chromosomes 3+4+8+10+17+19+20+22), forchromosome of interest 9, a sample having an NCV of 14.4 was identified(FIG. 17). This sample corresponded to the test sample in Example 6 thatwas suspected of being aneuploid for chromosome 9 following thecalculation of an aberrantly low dose for chromosome 21 (for whichchromosome 9 was used as the normalizing chromosome sequence in Example6).

The data show that 100% of the samples having clinical karyotypesindicating T21, T13 T18, T9 and monosomy X were correctly identified.FIG. 19 shows a plot of the NCVs for each of chromosomes 1-22 in each ofthe 47 test samples. Medians of NCVs were normalized to zero. The datashow that the method of the invention (including the use ofsystematically determined normalizing chromosome sequences) determinedthe presence of all 5 types of chromosomal aneuploidies that werepresent in this test set with 100% sensitivity and 100% specificity, andclearly indicate that the method can identify any complete chromosomalaneuploidy for any one of chromosomes 1-22, X, and Y, in any sample.

Example 8 Determination of the Presence or Absence of a Partial FetalChromosomal Aneuploidy: Determination of Cat Eye Syndrome

DiGeorge syndrome (22q11.2 deletion syndrome), a disorder caused by adefect in chromosome 22, results in the poor development of several bodysystems. Medical problems commonly associated with DiGeorge syndromeinclude heart defects, poor immune system function, a cleft palate, poorfunction of the parathyroid glands and behavioral disorders. The numberand severity of problems associated with DiGeorge syndrome vary greatly.Almost everyone with DiGeorge syndrome needs treatment from specialistsin a variety of fields.

To determine the presence or absence of a partial deletion of fetalchromosome 22, a blood sample is obtained by venipuncture for themother, and cfDNA is prepared as described in the Examples above. Thepurified cfDNA is ligated to adaptors and subjected to clusteramplification using the Illumina cBot cluster station. Massivelyparallel sequencing is performed using reversible dye terminators togenerate millions of 36 bp reads. The sequence reads are aligned to thehuman hg19 reference genome, and the reads that are uniquely mapped tothe reference genome are counted as tags.

A set of qualified samples all known to be diploid for chromosome 22i.e. chromosome 22 or any portion thereof is known to be present only ina diploid state, are first sequenced and analyzed to obtain a number ofsequence tags for each of 1000 segments of 3 megabases (Mb) (excludingthe region 22q11.2). Given that the human genome comprises approximately3 billion bases (3 Gb), the 1000 segments of 3 Mb each approximatelycomposes the remainder of the genome. Each of the 1000 segments canserve individually or as in a group of segment sequences that are usedto determine the normalizing segment sequence for the segment ofinterest i.e. the 3 Mb region of 22q11.2. The number of sequence tagsmapped to every single 1000 bp segment is used individually to computesegment doses for the 3 Mb region of 22q11.2. In addition, all possiblecombinations of two or more segments are used to determine segment dosesfor the segment of interest in all qualified samples. The single 3 Mbsegment or the combination of two or more 3 Mb segments that result inthe segment dose having the lowest variability across samples is chosenas the normalizing segment sequence.

The number of sequence tags mapped to the segment of interest in each ofthe qualified samples is used to determine a segment dose in each of thequalified samples. The mean and standard deviation of the segment dosesin all qualified samples is calculated, and used to set threshold s towhich segment doses determined in test samples can be compared.Preferably, normalized segment values (NSV) are calculated for allsegments of interest in all qualified samples, and used to set thethreshold values.

Subsequently, the number of tags mapped to the normalizing segmentsequence in the corresponding test sample is used to determine the doseof the segment of interest in the test sample. A normalized segmentvalue (NSV) is calculated for the segment in the test sample asdescribed previously and the NCV of the segment of interest in the testsample is compared to the threshold determined using the qualifiedsamples to determine the presence or absence of a deletion of 22q11.2 inthe test sample.

A test NCV<−3, indicates that a loss in the segment of interest i.e.partial deletion of chromosome 22 (22q11.2) is present in the testsample.

Example 9 Stool DNA Testing for Prediction of Outcome for StageIIColorectal Cancer Patients

Around 30% of all stage II colon cancer patients will relapse and die oftheir disease. Stage II colon cancers of patients who had relapse ofdisease showed significantly more losses on chromosomes 4, 5, 15q, 17qand 18q. In particular, stage II colon cancer patients losses on4q22.1-4q35.2 have been shown to be associated with worse outcome.Determination of the presence or absence of these genomic alterationsmay aid in selecting patients for adjuvant therapy (Brosens et al.,Analytical Cellular Pathology/Cellular Oncology 33: 95-104 [2010]).

To determine the presence or absence of one or more chromosomaldeletions in the 4q22.1 to 4q35.2 region in patients with stage IIcolorectal cancer, stool and/or plasma samples are obtained from thepatient(s). Stool DNA is prepared according to the method described byChen et al., J Natl Cancer Inst 97:1124-1132 [2005]); and plasma DNA isprepared according to the method described in the Examples above. DNA issequenced according to an NGS method described herein, and the sequenceinformation for the patient(s) sample(s) is used to calculate segmentdoses for one or more segments spanning the 4q22.1 to 4q35.2 region.Segment doses are determined using normalizing segment sequences thatare determined a priori by in a set of qualified stool and/or plasmasamples, respectively. Segment doses in the test samples (patientsamples) are calculated, and the presence or absence of one or morepartial chromosomal deletions within the 4q22.1 to 4q35.2 region isdetermined by comparing the NSV for each of the segments of interest tothe threshold set from the NSV in the set of qualified samples.

While preferred embodiments of the present invention have been shown anddescribed herein, it will be obvious to those skilled in the art thatsuch embodiments are provided by way of example only. Numerousvariations, changes, and substitutions will now occur to those skilledin the art without departing from the invention. It should be understoodthat various alternatives to the embodiments of the invention describedherein may be employed in practicing the invention. It is intended thatthe following claims define the scope of the invention and that methodsand structures within the scope of these claims and their equivalents becovered thereby.

1-6. (canceled)
 7. In a method of performing prenatal diagnosis usingmassively parallel sequencing of a mixture of fetal and maternalcell-free DNA extracted from the plasma portion of a human maternal testblood sample, the method including steps of extraction, purification,library preparation and cluster amplification, each of which can causevariation of the number of sequence tags that map to the referencegenome within and between sequencing runs, the improvement comprising:(a) performing massively parallel sequencing of at least a portion ofsaid fetal and maternal nucleic acids in said test sample to obtainsequence information, comprising sequence information for any one ormore chromosomes of interest selected from chromosomes 1-22, X, and Yand for a normalizing sequence for said one or more chromosomes ofinterest selected from chromosomes 1-22, X, and Y, said normalizingsequence having previously been systematically determined to minimizethe variation of the chromosome doses within and between sequencingruns; (b) receiving said sequence information in a computer readablemedium; (c) using computer readable logic, utilizing said sequenceinformation to identify a number of sequence tags for said one or morechromosomes of interest selected from chromosomes 1-22, X, and Y and toidentify a number of sequence tags for said normalizing sequence; (d)using computer readable logic, utilizing said number of sequence tagsidentified for said one or more chromosomes of interest selected fromchromosomes 1-22, X, and Y and said number of sequence tags identifiedfor said normalizing sequence obtained in step (d) to calculate achromosome dose for said one or more chromosome of interest selectedfrom chromosomes 1-22, X, and Y; and (e) calculating a differentialbetween the chromosome dose for said one or more chromosomes of interestselected from chromosomes 1-22, X, and Y and a threshold value, whereina statistically significant differential indicates the presence orabsence of complete fetal chromosomal aneuploidy for said one or morechromosome of interest selected from chromosomes 1-22, X, and Y.
 8. Themethod of claim 7, wherein step (e) comprises calculating a singlechromosome dose for each of said chromosomes of interest as the ratio ofthe number of sequence tags identified for each of said chromosomes ofinterest and the number of sequence tags identified for said normalizingsequence for each of said chromosomes of interest.
 9. The method ofclaim 7, wherein step (e) comprises: (i) calculating a sequence tagdensity ratio for each of said chromosomes of interest, by relating thenumber of sequence tags identified for each of said chromosomes ofinterest in step (b) to the length of each of said chromosomes ofinterest; (ii) calculating a sequence tag density ratio for each of saidnormalizing sequence by relating the number of sequence tags identifiedfor said normalizing sequence in step (d) to the length of each saidnormalizing sequence; and (iii) using the sequence tag density ratioscalculated in steps (i) and (ii) to calculate a single chromosome dosefor each chromosome of interest, wherein said chromosome dose iscalculated as a ratio of the sequence tag density ratio for each of saidchromosomes of interest and the sequence tag density ratio for saidnormalizing sequence for each of said chromosomes of interest.
 10. Themethod of claim 7, wherein each said normalizing sequence for each ofsaid chromosomes of interest is determined by a method comprisingsystematically calculating multiple chromosome doses for each of saidchromosomes of interest in a set of qualified samples.
 11. The method ofclaim 7, wherein steps (a)-(e) are repeated for test samples fromdifferent maternal subjects, and wherein the method comprisesdetermining the presence or absence of complete fetal chromosomalaneuploidy for said one or more chromosomes of interest selected fromchromosomes 1-22, X, and Y in each of said samples.
 12. The method ofclaim 7, further comprising calculating a normalized chromosome value(NCV), wherein said NCV relates said chromosome dose to the mean of thecorresponding chromosome dose in a set of qualified samples as:${NCV_{ij}} = \frac{x_{ij} - {\hat{\mu}}_{j}}{{\hat{\sigma}}_{j}}$ where{circumflex over (μ)}_(j) and {circumflex over (σ)}_(j) are theestimated mean and standard deviation, respectively, for the j-thchromosome dose in a set of qualified samples, and x_(ij) is theobserved j-th chromosome dose for test sample i.
 13. The method of claim7, said one or more chromosome of interest comprises chromosome 13, 18,21, X, and/or Y.
 14. The method of claim 7, said normalizing sequence isa single chromosome.
 15. The method of claim 7, wherein said normalizingsequence is a group of chromosomes.
 16. The method of claim 7, whereinsaid sequencing comprises: sequencing-by-hybridization;sequencing-by-synthesis with reversible dye terminators;sequencing-by-ligation; or single molecule sequencing.
 17. The method ofclaim 7, wherein said cell free DNA is amplified using PCR before it issubject to cluster amplification.
 18. A method comprising: (a) preparinga sequencing library from a mixture of fetal and maternal nucleic acidmolecules in a maternal test sample, wherein preparing said librarycomprises the consecutive steps of end-repairing, dA-tailing and adaptorligating said nucleic acids, wherein said consecutive steps excludepurifying the end-repaired products prior to said dA-tailing and excludepurifying the dA-tailing products prior to said adaptor-ligating; (b)performing massively parallel sequencing of at least a portion of saidfetal and maternal nucleic acids in said test sample to obtain sequenceinformation, comprising sequence information for said one or morechromosomes of interest selected from chromosomes 1-22, X, and Y and fora normalizing sequence for each said one or more chromosomes of interestselected from chromosomes 1-22, X, and Y, said normalizing sequencehaving previously been systematically determined to minimize thevariation of the chromosome doses within and between sequencing runs;(c) receiving said sequence information in a computer readable medium;(d) using computer readable logic, utilizing said sequence informationto identify a number of sequence tags for said one or more chromosomesof interest selected from chromosomes 1-22, X, and Y and to identify anumber of sequence tags for said normalizing sequence; (e) usingcomputer readable logic, utilizing said number of sequence tagsidentified for said one or more chromosomes of interest selected fromchromosomes 1-22, X, and Y and said number of sequence tags identifiedfor said normalizing sequence obtained in step (c) to calculate achromosome dose for said one or more chromosomes of interest selectedfrom chromosomes 1-22, X, and Y; and (f) calculating a differentialbetween the chromosome dose for said one or more chromosomes of interestselected from chromosomes 1-22, X, and Y and a threshold value, whereina statistically significant differential indicates the presence orabsence of complete fetal chromosomal aneuploidy for said one or morechromosomes of interest selected from chromosomes 1-22, X, and Y. 19.The method of claim 18, wherein step (f) comprises calculating a singlechromosome dose for each of said chromosomes of interest as the ratio ofthe number of sequence tags identified for each of said chromosomes ofinterest and the number of sequence tags identified for said normalizingsequence for each of said chromosomes of interest.
 20. The method ofclaim 18, wherein step (f) comprises: (i) calculating a sequence tagdensity ratio for each of said chromosomes of interest, by relating thenumber of sequence tags identified for each of said chromosomes ofinterest in step (b) to the length of each of said chromosomes ofinterest; (ii) calculating a sequence tag density ratio for each saidnormalizing sequence by relating the number of sequence tags identifiedfor said normalizing sequence in step (e) to the length of each saidnormalizing sequence; and (iii) using the sequence tag density ratioscalculated in steps (i) and (ii) to calculate a single chromosome dosefor each of said chromosomes of interest, wherein said chromosome doseis calculated as a ratio of the sequence tag density ratio for each ofsaid chromosomes of interest and the sequence tag density ratio for saidnormalizing sequence for each of said chromosomes of interest.
 21. Themethod of claim 18, wherein each said normalizing sequence for each ofsaid chromosomes of interest is determined by a method comprisingsystematically calculating multiple chromosome doses for each of saidchromosomes of interest in a set of qualified samples.
 22. The method ofclaim 18, further comprising calculating a normalized chromosome value(NCV), wherein said NCV relates said chromosome dose to the mean of thecorresponding chromosome dose in a set of qualified samples as:${NCV_{ij}} = \frac{x_{ij} - {\hat{\mu}}_{j}}{{\hat{\sigma}}_{j}}$ where{circumflex over (μ)}_(j) and {circumflex over (σ)}_(j) are theestimated mean and standard deviation, respectively, for the j-thchromosome dose in a set of qualified samples, and x_(ij) is theobserved j-th chromosome dose for test sample i.
 23. The method of claim18, said one or more chromosomes of interest comprises chromosome 13,18, 21, X, and/or Y.
 24. The method of claim 18, wherein saidnormalizing sequence is a single chromosome.
 25. The method of claim 18,wherein said normalizing sequence is a group of chromosomes.